[jira] [Updated] (CASSANDRA-15229) Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed Chunks
[ https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-15229: Status: Ready to Commit (was: Review In Progress) > Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed > Chunks > > > Key: CASSANDRA-15229 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15229 > Project: Cassandra > Issue Type: Bug > Components: Local/Caching >Reporter: Benedict Elliott Smith >Assignee: Zhao Yang >Priority: Normal > Fix For: 4.0, 4.0-beta > > Attachments: 15229-count.png, 15229-direct.png, 15229-hit-rate.png, > 15229-recirculate-count.png, 15229-recirculate-hit-rate.png, > 15229-recirculate-size.png, 15229-recirculate.png, 15229-size.png, > 15229-unsafe.png > > > The BufferPool was never intended to be used for a {{ChunkCache}}, and we > need to either change our behaviour to handle uncorrelated lifetimes or use > something else. This is particularly important with the default chunk size > for compressed sstables being reduced. If we address the problem, we should > also utilise the BufferPool for native transport connections like we do for > internode messaging, and reduce the number of pooling solutions we employ. > Probably the best thing to do is to improve BufferPool’s behaviour when used > for things with uncorrelated lifetimes, which essentially boils down to > tracking those chunks that have not been freed and re-circulating them when > we run out of completely free blocks. We should probably also permit > instantiating separate {{BufferPool}}, so that we can insulate internode > messaging from the {{ChunkCache}}, or at least have separate memory bounds > for each, and only share fully-freed chunks. > With these improvements we can also safely increase the {{BufferPool}} chunk > size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce > the amount of global coordination and per-allocation overhead. We don’t need > 1KiB granularity for allocations, nor 16 byte granularity for tiny > allocations. > - > Since CASSANDRA-5863, chunk cache is implemented to use buffer pool. When > local pool is full, one of its chunks will be evicted and only put back to > global pool when all buffers in the evicted chunk are released. But due to > chunk cache, buffers can be held for long period of time, preventing evicted > chunk to be recycled even though most of space in the evicted chunk are free. > There two things need to be improved: > 1. Evicted chunk with free space should be recycled to global pool, even if > it's not fully free. It's doable in 4.0. > 2. Reduce fragmentation caused by different buffer size. With #1, partially > freed chunk will be available for allocation, but "holes" in the partially > freed chunk are with different sizes. We should consider allocating fixed > buffer size which is unlikely to fit in 4.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16114) Fix tests CQLTester.assertLastSchemaChange causes ClassCastException
[ https://issues.apache.org/jira/browse/CASSANDRA-16114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214442#comment-17214442 ] Berenguer Blasi commented on CASSANDRA-16114: - [~cedric.nabaa] are you still planning on working on this one? > Fix tests CQLTester.assertLastSchemaChange causes ClassCastException > > > Key: CASSANDRA-16114 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16114 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Cedric Nabaa >Priority: Normal > Fix For: 4.0-beta > > > Build: > https://app.circleci.com/pipelines/github/dcapwell/cassandra/494/workflows/b3765545-7b09-48dd-85ff-830c4f348329/jobs/2681 > {code} > java.lang.ClassCastException: > org.apache.cassandra.transport.messages.ResultMessage$Void cannot be cast to > org.apache.cassandra.transport.messages.ResultMessage$SchemaChange > at > org.apache.cassandra.cql3.CQLTester.assertLastSchemaChange(CQLTester.java:916) > at > org.apache.cassandra.cql3.validation.entities.UFTest.testSchemaChange(UFTest.java:94) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15996) Fix flaky python dtest test_expiration_overflow_policy_capnowarn - ttl_test.TestTTL
[ https://issues.apache.org/jira/browse/CASSANDRA-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214434#comment-17214434 ] Berenguer Blasi commented on CASSANDRA-15996: - The only 2 things that came to my mind are: - On node start, instead of relying on the patient cql connection, lets add flags to wait for the binary protocol and other startup stuff to complete i.e . But this is just a stab in the dark based on previous experience fixing tests. Just in case there is some esoteric race at startup. - {{NoSpamLogger}} has some shuffling of instances around that _maybe_ have a concurrency hole, _maybe_ I am just imagining things. I have to look at it for a while a bit longer to make up my mind. In any case I didn't see how that could affect in this particular case were usage is pretty straightforward and not multithreaded. So I am also at a loss here so far as well. > Fix flaky python dtest test_expiration_overflow_policy_capnowarn - > ttl_test.TestTTL > --- > > Key: CASSANDRA-15996 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15996 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: David Capwell >Assignee: Adam Holmberg >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > > https://app.circleci.com/pipelines/github/dcapwell/cassandra/361/workflows/3a42fa45-1f60-4c95-86a4-15a6773e384e/jobs/1860 > {code} > > assert warning, 'Log message should be print for CAP and > > CAP_NOWARN policy' > E AssertionError: Log message should be print for CAP and > CAP_NOWARN policy > E assert [] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16152) In-JVM dtest - modify schema with stopped nodes and use yaml fragments for config
[ https://issues.apache.org/jira/browse/CASSANDRA-16152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi updated CASSANDRA-16152: - Reviewers: Alex Petrov, David Capwell, Dinesh Joshi, Yifan Cai (was: David Capwell, Dinesh Joshi, Yifan Cai) > In-JVM dtest - modify schema with stopped nodes and use yaml fragments for > config > - > > Key: CASSANDRA-16152 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16152 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/java >Reporter: Jon Meredith >Assignee: Jon Meredith >Priority: Normal > > Some convenience improvements to in-JVM dtest that are useful across versions > that I needed while working on CASSANDRA-16144 > * Add support for changing schema with stopped nodes. > * Make it simpler to modify nested configuration items by specifying Yaml > fragments -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16180) 4.0 quality testing: Coordination
[ https://issues.apache.org/jira/browse/CASSANDRA-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214286#comment-17214286 ] Benedict Elliott Smith commented on CASSANDRA-16180: {quote}I'd also propose that we leave Paxos/CAS out of scope for this issue. {quote} Yes, that's probably best - there's a related ticket where Sylvain and I have both expanded Paxos test coverage anyway, and besides this I think it is better to wait until post 4.0 (shortly after which I hope the Paxos landscape will materially improve in the project) > 4.0 quality testing: Coordination > - > > Key: CASSANDRA-16180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16180 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Andres de la Peña >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0 > > > This is a subtask of CASSANDRA-15579 focusing on coordination. > I think that the main reference dtest for this is > [consistency_test.py|https://github.com/apache/cassandra-dtest/blob/master/consistency_test.py]. > We should identify which other tests cover this and identify what should be > extended, similarly to what has been done with CASSANDRA-15977. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16211) Improve job metadata queries exception handling in cassandra-diff
Yifan Cai created CASSANDRA-16211: - Summary: Improve job metadata queries exception handling in cassandra-diff Key: CASSANDRA-16211 URL: https://issues.apache.org/jira/browse/CASSANDRA-16211 Project: Cassandra Issue Type: Improvement Components: Tool/diff Reporter: Yifan Cai Assignee: Yifan Cai The job metadata tracks the progress of the diff job. Sometimes, a job can fail due to the progress update query failures. The progress update queries can be categorized into 2 groups, critical and trivial one. When a query failed to update a trivial status (e.g. ProgressTracker), we would mostly hope to continue the job and just log the failure. When a query failed to update a critical status (e.g. JobLifeCycle), we can apply the client-side retry strategy (e.g. exponential backoff) in addition to the retry policy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15241) Virtual table to expose current running queries
[ https://issues.apache.org/jira/browse/CASSANDRA-15241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214206#comment-17214206 ] Chris Lohfink commented on CASSANDRA-15241: --- I gave a talk on why I feel its necessary last year at apache con, that said its super late in release and its a pretty big patch (mostly just code making Mutations and Messages human readable) so I understand it not going in. Review feedback has been addressed I believe so waiting on that. > Virtual table to expose current running queries > --- > > Key: CASSANDRA-15241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15241 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Virtual Tables >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Normal > Fix For: 4.0 > > > Expose current running queries and their duration. > {code}cqlsh> select * from system_views.queries; > thread_id| duration_micros | task > --+-+- > Native-Transport-Requests-17 |6325 | QUERY > select * from system_views.queries; [pageSize = 100] > Native-Transport-Requests-4 | 14681 | EXECUTE > f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE > Native-Transport-Requests-6 | 14678 | EXECUTE > f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE > ReadStage-10 | 16535 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-13 | 16535 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-14 | 16535 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-19 | 11861 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-20 | 11861 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-22 |7279 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-23 |4716 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-5 | 16535 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-7 | 16535 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-8 | 16535 | >SELECT * FROM basic.wide1 LIMIT 5000{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16148) Test failures caused by merging CASSANDRA-15833
[ https://issues.apache.org/jira/browse/CASSANDRA-16148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214205#comment-17214205 ] Jordan West commented on CASSANDRA-16148: - Committed. Thanks. > Test failures caused by merging CASSANDRA-15833 > --- > > Key: CASSANDRA-16148 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16148 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jordan West >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta3 > > > Three issues were caused by merging CASSANDRA-15833: > 1. `GossiperTest#testHaveAnyVersion3Nodes` was failing on trunk: > https://app.circleci.com/pipelines/github/jrwest/cassandra/53/workflows/95f9f401-1ef8-4b8d-9c64-3703d9669d95/jobs/771 > 2. python dtest ReadRepairTest#test_atomic_writes[blocking] was failing > 3. In-jvm dtests being worked on as part of CASSANDRA-15977 uncovered an > issue with how CASSANDRA-15833 changes interacted with in-jvm dtests running > without {{Feature.GOSSIP}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16148) Test failures caused by merging CASSANDRA-15833
[ https://issues.apache.org/jira/browse/CASSANDRA-16148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-16148: Fix Version/s: 4.0-beta3 Since Version: 4.0-beta3 Source Control Link: https://github.com/jrwest/cassandra/commit/9a40e8079baff6f499229535a4af75be97f9a3b9 https://github.com/apache/cassandra/commit/06bc316c89053067d162da3f118b43a62dcf0854 Resolution: Fixed Status: Resolved (was: Ready to Commit) > Test failures caused by merging CASSANDRA-15833 > --- > > Key: CASSANDRA-16148 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16148 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jordan West >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta3 > > > Three issues were caused by merging CASSANDRA-15833: > 1. `GossiperTest#testHaveAnyVersion3Nodes` was failing on trunk: > https://app.circleci.com/pipelines/github/jrwest/cassandra/53/workflows/95f9f401-1ef8-4b8d-9c64-3703d9669d95/jobs/771 > 2. python dtest ReadRepairTest#test_atomic_writes[blocking] was failing > 3. In-jvm dtests being worked on as part of CASSANDRA-15977 uncovered an > issue with how CASSANDRA-15833 changes interacted with in-jvm dtests running > without {{Feature.GOSSIP}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16148) Test failures caused by merging CASSANDRA-15833
[ https://issues.apache.org/jira/browse/CASSANDRA-16148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-16148: Status: Ready to Commit (was: Changes Suggested) > Test failures caused by merging CASSANDRA-15833 > --- > > Key: CASSANDRA-16148 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16148 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jordan West >Assignee: Jordan West >Priority: Normal > > Three issues were caused by merging CASSANDRA-15833: > 1. `GossiperTest#testHaveAnyVersion3Nodes` was failing on trunk: > https://app.circleci.com/pipelines/github/jrwest/cassandra/53/workflows/95f9f401-1ef8-4b8d-9c64-3703d9669d95/jobs/771 > 2. python dtest ReadRepairTest#test_atomic_writes[blocking] was failing > 3. In-jvm dtests being worked on as part of CASSANDRA-15977 uncovered an > issue with how CASSANDRA-15833 changes interacted with in-jvm dtests running > without {{Feature.GOSSIP}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch trunk updated (3a05ed3 -> 06bc316)
This is an automated email from the ASF dual-hosted git repository. jwest pushed a change to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git. from 3a05ed3 Follow-up: fix test failures caused by 16207. add 9a40e80 upgrade to dtest-api 0.0.6 add 7b3a15d Merge branch 'cassandra-2.2' into cassandra-3.0 add a6c2224 Merge branch 'cassandra-3.0' into cassandra-3.11 add 06bc316 Merge branch 'cassandra-3.11' into trunk No new revisions were added by this update. Summary of changes: build.xml | 2 +- src/java/org/apache/cassandra/gms/Gossiper.java| 14 +-- .../cassandra/utils/ExpiringMemoizingSupplier.java | 132 + .../impl/DelegatingInvokableInstance.java | 6 + .../cassandra/distributed/impl/Instance.java | 16 ++- .../cassandra/distributed/test/ReadRepairTest.java | 15 +++ .../org/apache/cassandra/gms/GossiperTest.java | 8 +- 7 files changed, 180 insertions(+), 13 deletions(-) create mode 100644 src/java/org/apache/cassandra/utils/ExpiringMemoizingSupplier.java - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch cassandra-3.11 updated (d3f7bdf -> a6c2224)
This is an automated email from the ASF dual-hosted git repository. jwest pushed a change to branch cassandra-3.11 in repository https://gitbox.apache.org/repos/asf/cassandra.git. from d3f7bdf Merge branch 'cassandra-3.0' into cassandra-3.11 add 9a40e80 upgrade to dtest-api 0.0.6 add 7b3a15d Merge branch 'cassandra-2.2' into cassandra-3.0 add a6c2224 Merge branch 'cassandra-3.0' into cassandra-3.11 No new revisions were added by this update. Summary of changes: build.xml | 2 +- .../cassandra/distributed/impl/DelegatingInvokableInstance.java | 6 ++ .../distributed/org/apache/cassandra/distributed/impl/Instance.java | 5 + 3 files changed, 12 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch cassandra-2.2 updated (521a6e2 -> 9a40e80)
This is an automated email from the ASF dual-hosted git repository. jwest pushed a change to branch cassandra-2.2 in repository https://gitbox.apache.org/repos/asf/cassandra.git. from 521a6e2 Fixed a NullPointerException when calling nodetool enablethrift add 9a40e80 upgrade to dtest-api 0.0.6 No new revisions were added by this update. Summary of changes: build.xml | 2 +- .../cassandra/distributed/impl/DelegatingInvokableInstance.java | 6 ++ .../distributed/org/apache/cassandra/distributed/impl/Instance.java | 5 + 3 files changed, 12 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch cassandra-3.0 updated (6eeca9d -> 7b3a15d)
This is an automated email from the ASF dual-hosted git repository. jwest pushed a change to branch cassandra-3.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git. from 6eeca9d Fix NPE when calling broadcast address on unintialized node add 9a40e80 upgrade to dtest-api 0.0.6 add 7b3a15d Merge branch 'cassandra-2.2' into cassandra-3.0 No new revisions were added by this update. Summary of changes: build.xml | 2 +- .../cassandra/distributed/impl/DelegatingInvokableInstance.java| 7 +++ .../org/apache/cassandra/distributed/impl/Instance.java| 5 + 3 files changed, 13 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15996) Fix flaky python dtest test_expiration_overflow_policy_capnowarn - ttl_test.TestTTL
[ https://issues.apache.org/jira/browse/CASSANDRA-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214127#comment-17214127 ] Adam Holmberg commented on CASSANDRA-15996: --- [~Bereng] I noticed that too. I've been staring at NoSpam logger for a bit and haven't seen a way that it should fail in this way with a single request in flight. What did you have in mind for an edge case? I looked a bit at the logs from the other failure and noticed one anomaly. I'm not sure how it could be related, but I noticed that server never emits the "Startup complete" message. We only have one example of this. The logs from the test run on this ticket are expired out of Circle. I was coming here to ask [~dcapwell] or anyone if they have other examples of this failing where the log files are still retained? > Fix flaky python dtest test_expiration_overflow_policy_capnowarn - > ttl_test.TestTTL > --- > > Key: CASSANDRA-15996 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15996 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: David Capwell >Assignee: Adam Holmberg >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > > https://app.circleci.com/pipelines/github/dcapwell/cassandra/361/workflows/3a42fa45-1f60-4c95-86a4-15a6773e384e/jobs/1860 > {code} > > assert warning, 'Log message should be print for CAP and > > CAP_NOWARN policy' > E AssertionError: Log message should be print for CAP and > CAP_NOWARN policy > E assert [] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16209) Log Warning Rather than Verbose Trace when Preview Repair Validation Conflicts with Incremental Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-16209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-16209: Reviewers: Marcus Eriksson > Log Warning Rather than Verbose Trace when Preview Repair Validation > Conflicts with Incremental Repair > -- > > Key: CASSANDRA-16209 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16209 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 4.0-beta > > Time Spent: 0.5h > Remaining Estimate: 0h > > When a preview repair on repaired data identifies which SSTables to validate, > it might come across an SSTable that's still pending for an in-progress > incremental repair session. It makes sense that we immediately fail the > preview repair in that case, but the resulting error and verbose stack trace > in the logs is a bit too severe a reaction. We should downgrade this to a > simple warning message. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16148) Test failures caused by merging CASSANDRA-15833
[ https://issues.apache.org/jira/browse/CASSANDRA-16148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214074#comment-17214074 ] David Capwell edited comment on CASSANDRA-16148 at 10/14/20, 5:17 PM: -- I feel that https://app.circleci.com/pipelines/github/jrwest/cassandra/71/workflows/a6356d72-d33c-449d-8561-332ec190910c/jobs/885 is because you didn't rebase... I added a lot line to all branches to detect when we complete startup, and looks like it times out after 10m since it never sees that log. Confirmed, https://github.com/jrwest/cassandra/commits/jwest/16148 doesn't have the commit which checks for the log. was (Author: dcapwell): I feel that https://app.circleci.com/pipelines/github/jrwest/cassandra/71/workflows/a6356d72-d33c-449d-8561-332ec190910c/jobs/885 is because you didn't rebase... I added a lot line to all branches to detect when we complete startup, and looks like it times out after 10m since it never sees that log. > Test failures caused by merging CASSANDRA-15833 > --- > > Key: CASSANDRA-16148 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16148 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jordan West >Assignee: Jordan West >Priority: Normal > > Three issues were caused by merging CASSANDRA-15833: > 1. `GossiperTest#testHaveAnyVersion3Nodes` was failing on trunk: > https://app.circleci.com/pipelines/github/jrwest/cassandra/53/workflows/95f9f401-1ef8-4b8d-9c64-3703d9669d95/jobs/771 > 2. python dtest ReadRepairTest#test_atomic_writes[blocking] was failing > 3. In-jvm dtests being worked on as part of CASSANDRA-15977 uncovered an > issue with how CASSANDRA-15833 changes interacted with in-jvm dtests running > without {{Feature.GOSSIP}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16148) Test failures caused by merging CASSANDRA-15833
[ https://issues.apache.org/jira/browse/CASSANDRA-16148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214075#comment-17214075 ] David Capwell commented on CASSANDRA-16148: --- +1 from me > Test failures caused by merging CASSANDRA-15833 > --- > > Key: CASSANDRA-16148 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16148 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jordan West >Assignee: Jordan West >Priority: Normal > > Three issues were caused by merging CASSANDRA-15833: > 1. `GossiperTest#testHaveAnyVersion3Nodes` was failing on trunk: > https://app.circleci.com/pipelines/github/jrwest/cassandra/53/workflows/95f9f401-1ef8-4b8d-9c64-3703d9669d95/jobs/771 > 2. python dtest ReadRepairTest#test_atomic_writes[blocking] was failing > 3. In-jvm dtests being worked on as part of CASSANDRA-15977 uncovered an > issue with how CASSANDRA-15833 changes interacted with in-jvm dtests running > without {{Feature.GOSSIP}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16148) Test failures caused by merging CASSANDRA-15833
[ https://issues.apache.org/jira/browse/CASSANDRA-16148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214074#comment-17214074 ] David Capwell commented on CASSANDRA-16148: --- I feel that https://app.circleci.com/pipelines/github/jrwest/cassandra/71/workflows/a6356d72-d33c-449d-8561-332ec190910c/jobs/885 is because you didn't rebase... I added a lot line to all branches to detect when we complete startup, and looks like it times out after 10m since it never sees that log. > Test failures caused by merging CASSANDRA-15833 > --- > > Key: CASSANDRA-16148 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16148 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jordan West >Assignee: Jordan West >Priority: Normal > > Three issues were caused by merging CASSANDRA-15833: > 1. `GossiperTest#testHaveAnyVersion3Nodes` was failing on trunk: > https://app.circleci.com/pipelines/github/jrwest/cassandra/53/workflows/95f9f401-1ef8-4b8d-9c64-3703d9669d95/jobs/771 > 2. python dtest ReadRepairTest#test_atomic_writes[blocking] was failing > 3. In-jvm dtests being worked on as part of CASSANDRA-15977 uncovered an > issue with how CASSANDRA-15833 changes interacted with in-jvm dtests running > without {{Feature.GOSSIP}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15935) Improve machinery for testing consistency in presence of range movements
[ https://issues.apache.org/jira/browse/CASSANDRA-15935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214046#comment-17214046 ] David Capwell commented on CASSANDRA-15935: --- Since aleksey is on-board with Action, I will backoff and not argue the point. > Improve machinery for testing consistency in presence of range movements > > > Key: CASSANDRA-15935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15935 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/java >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > > Currently, we can test range movements only by adding and bootstrapping a new > node. This is both inefficient and insufficient for large-scale tests. We > need a possibility to dynamically change ring ownership over the lifetime of > cluster, with a flexibility to changing gossip status of the node from > perspective of other participants, adding and removing nodes from other > nodes' views on demand. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16091) rpc server gets wrongly initialized with rpc_enabled:false
[ https://issues.apache.org/jira/browse/CASSANDRA-16091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214043#comment-17214043 ] Dmitrii Saprykin edited comment on CASSANDRA-16091 at 10/14/20, 4:45 PM: - Is this issue fixed by CASSANDRA-16127 ? was (Author: saprykin): Is this issue fixed by CASSANDRA-16124 ? > rpc server gets wrongly initialized with rpc_enabled:false > -- > > Key: CASSANDRA-16091 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16091 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Tom van der Woerdt >Assignee: David Capwell >Priority: Normal > Fix For: 2.2.x, 3.0.x, 3.11.x > > > After upgrading to Cassandra 3.11.8, Cassandra no longer starts. An exception > is thrown: > {code:java} > java.lang.RuntimeException: Client SSL is not supported for non-blocking > sockets (hsha). Please remove client ssl from the configuration. > at > org.apache.cassandra.thrift.THsHaDisruptorServer$Factory.buildTServer(THsHaDisruptorServer.java:74) > at > org.apache.cassandra.thrift.TServerCustomFactory.buildTServer(TServerCustomFactory.java:55) > at > org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.(ThriftServer.java:128) > at org.apache.cassandra.thrift.ThriftServer.start(ThriftServer.java:55) > at > org.apache.cassandra.service.CassandraDaemon.startNativeTransport(CassandraDaemon.java:713) > at > org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:538) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:643) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:768) > {code} > No configuration changed between 3.11.7 and 3.11.8. rpc_enabled is false in > both versions. > I created this Jira issue because clearly something changed between 3.11.7 > and 3.11.8. Maybe intentional, maybe not. Changing `rpc_server_type` (which > is not clearly documented to be about Thrift only) from `hsha` to `sync` does > resolve the issue, as expected, but this does seem like a regression, hence > the Jira issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16091) rpc server gets wrongly initialized with rpc_enabled:false
[ https://issues.apache.org/jira/browse/CASSANDRA-16091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214043#comment-17214043 ] Dmitrii Saprykin commented on CASSANDRA-16091: -- Is this issue fixed by CASSANDRA-16124 ? > rpc server gets wrongly initialized with rpc_enabled:false > -- > > Key: CASSANDRA-16091 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16091 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Tom van der Woerdt >Assignee: David Capwell >Priority: Normal > Fix For: 2.2.x, 3.0.x, 3.11.x > > > After upgrading to Cassandra 3.11.8, Cassandra no longer starts. An exception > is thrown: > {code:java} > java.lang.RuntimeException: Client SSL is not supported for non-blocking > sockets (hsha). Please remove client ssl from the configuration. > at > org.apache.cassandra.thrift.THsHaDisruptorServer$Factory.buildTServer(THsHaDisruptorServer.java:74) > at > org.apache.cassandra.thrift.TServerCustomFactory.buildTServer(TServerCustomFactory.java:55) > at > org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.(ThriftServer.java:128) > at org.apache.cassandra.thrift.ThriftServer.start(ThriftServer.java:55) > at > org.apache.cassandra.service.CassandraDaemon.startNativeTransport(CassandraDaemon.java:713) > at > org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:538) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:643) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:768) > {code} > No configuration changed between 3.11.7 and 3.11.8. rpc_enabled is false in > both versions. > I created this Jira issue because clearly something changed between 3.11.7 > and 3.11.8. Maybe intentional, maybe not. Changing `rpc_server_type` (which > is not clearly documented to be about Thrift only) from `hsha` to `sync` does > resolve the issue, as expected, but this does seem like a regression, hence > the Jira issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16180) 4.0 quality testing: Coordination
[ https://issues.apache.org/jira/browse/CASSANDRA-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214034#comment-17214034 ] Caleb Rackliffe commented on CASSANDRA-16180: - I'd also propose that we leave Paxos/CAS out of scope for this issue. CC [~benedict] > 4.0 quality testing: Coordination > - > > Key: CASSANDRA-16180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16180 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Andres de la Peña >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0 > > > This is a subtask of CASSANDRA-15579 focusing on coordination. > I think that the main reference dtest for this is > [consistency_test.py|https://github.com/apache/cassandra-dtest/blob/master/consistency_test.py]. > We should identify which other tests cover this and identify what should be > extended, similarly to what has been done with CASSANDRA-15977. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16181) 4.0 quality testing: Replication
[ https://issues.apache.org/jira/browse/CASSANDRA-16181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214032#comment-17214032 ] Caleb Rackliffe commented on CASSANDRA-16181: - See https://issues.apache.org/jira/browse/CASSANDRA-16180?focusedCommentId=17214031=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17214031, but focusing on hints and the write path. > 4.0 quality testing: Replication > > > Key: CASSANDRA-16181 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16181 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Andres de la Peña >Priority: Normal > Fix For: 4.0 > > > This is a subtask of CASSANDRA-15579 focusing on replication. > I think that the main reference dtest for this is > [replication_test.py|https://github.com/apache/cassandra-dtest/blob/master/replication_test.py]. > We should identify which other tests cover this and identify what should be > extended, similarly to what has been done with CASSANDRA-15977. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16180) 4.0 quality testing: Coordination
[ https://issues.apache.org/jira/browse/CASSANDRA-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214031#comment-17214031 ] Caleb Rackliffe commented on CASSANDRA-16180: - [~adelapena] [~bdeggleston] I know the main focus of this issue is testing, but I want to propose something that bleeds into our documentation and code organization along the way. {{StorageProxy}} is one of the most critical classes in the entire project, but it is almost exactly 3000 lines of code and has zero class-level JavaDoc. We should break it up into its major constituent parts (hints, Paxos, point reads, range reads, etc.) and consider testing those constituent parts in isolation. (There is a {{StorageProxyTest}}, but it's really just a test for some utilities that also happen to be jammed into {{StorageProxy}}.) We don't have to boil the ocean either. You, [~jasonstack], and I know that SAI is already likely going to pull the range read logic out of {{StorageProxy}}, so pulling that forward (again, assuming we have reasonable tests to avoid risk) along w/ point reads could be a good first step. (That also corresponds pretty closely to this Jira in particular.) > 4.0 quality testing: Coordination > - > > Key: CASSANDRA-16180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16180 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Andres de la Peña >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0 > > > This is a subtask of CASSANDRA-15579 focusing on coordination. > I think that the main reference dtest for this is > [consistency_test.py|https://github.com/apache/cassandra-dtest/blob/master/consistency_test.py]. > We should identify which other tests cover this and identify what should be > extended, similarly to what has been done with CASSANDRA-15977. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16181) 4.0 quality testing: Replication
[ https://issues.apache.org/jira/browse/CASSANDRA-16181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-16181: Reviewers: Caleb Rackliffe > 4.0 quality testing: Replication > > > Key: CASSANDRA-16181 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16181 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Andres de la Peña >Priority: Normal > Fix For: 4.0 > > > This is a subtask of CASSANDRA-15579 focusing on replication. > I think that the main reference dtest for this is > [replication_test.py|https://github.com/apache/cassandra-dtest/blob/master/replication_test.py]. > We should identify which other tests cover this and identify what should be > extended, similarly to what has been done with CASSANDRA-15977. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16181) 4.0 quality testing: Replication
[ https://issues.apache.org/jira/browse/CASSANDRA-16181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214025#comment-17214025 ] Caleb Rackliffe commented on CASSANDRA-16181: - [~adelapena] Do you think we should cover hints/hinted handoff here? In some sense, this and CASSANDRA-16180 are both about coordinator hardening, but this one is on the write side, and that one the read side. > 4.0 quality testing: Replication > > > Key: CASSANDRA-16181 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16181 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Andres de la Peña >Priority: Normal > Fix For: 4.0 > > > This is a subtask of CASSANDRA-15579 focusing on replication. > I think that the main reference dtest for this is > [replication_test.py|https://github.com/apache/cassandra-dtest/blob/master/replication_test.py]. > We should identify which other tests cover this and identify what should be > extended, similarly to what has been done with CASSANDRA-15977. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16180) 4.0 quality testing: Coordination
[ https://issues.apache.org/jira/browse/CASSANDRA-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-16180: Reviewers: Caleb Rackliffe > 4.0 quality testing: Coordination > - > > Key: CASSANDRA-16180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16180 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Andres de la Peña >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0 > > > This is a subtask of CASSANDRA-15579 focusing on coordination. > I think that the main reference dtest for this is > [consistency_test.py|https://github.com/apache/cassandra-dtest/blob/master/consistency_test.py]. > We should identify which other tests cover this and identify what should be > extended, similarly to what has been done with CASSANDRA-15977. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16057) Should update in-jvm dtest to expose stdout and stderr for nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214013#comment-17214013 ] Alex Petrov commented on CASSANDRA-16057: - +1 > Should update in-jvm dtest to expose stdout and stderr for nodetool > --- > > Key: CASSANDRA-16057 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16057 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/java >Reporter: David Capwell >Assignee: Yifan Cai >Priority: Normal > Fix For: NA > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Many nodetool commands output to stdout or stderr so running nodetool using > in-jvm dtest should expose that to tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear
[ https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-16210: Fix Version/s: (was: 4.0-beta3) > Synchronize Keyspace instance store/clear > - > > Key: CASSANDRA-16210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16210 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.x > > > Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one > random failure was reported which pointed to a race condition to be spotted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16207) NPE when calling broadcast address on unintialized node
[ https://issues.apache.org/jira/browse/CASSANDRA-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov updated CASSANDRA-16207: Status: Resolved (was: Open) Thank you [~marcuse]! Committed a follow-up to [3a05ed3ce15ab4dcd5f13b9b56c18c0198c0e203|https://github.com/apache/cassandra/commit/3a05ed3ce15ab4dcd5f13b9b56c18c0198c0e203] > NPE when calling broadcast address on unintialized node > --- > > Key: CASSANDRA-16207 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16207 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Fix For: 3.0.23, 3.11.9, 4.0-beta3 > > > When trying to run upgrades, sometimes we’re calling broadcasts addrerss on > an uninitialised new node: > {code} > java.lang.IllegalStateException: Can't use shut down instances, delegate is > null > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163) > at > org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53) > > at > org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278) > > at > org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) > ~[dtest-3.0.19.jar:?] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213) > > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182) > > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93) > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16057) Should update in-jvm dtest to expose stdout and stderr for nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214000#comment-17214000 ] Yifan Cai commented on CASSANDRA-16057: --- Good catch! It turns out that my previous command used to list {{(System.out | System.err)}} usage was wrong. The "-u" is unnecessary. I have fixed the 3.11 branch. {code:java} 08:14:09 in cassandra on b/CASSANDRA-16057-3.11 ➜ egrep -r 'System.out|System.err' src/java/org/apache/cassandra/tools | awk {'print $1'} | sort | uniq src/java/org/apache/cassandra/tools/AbstractJmxClient.java: src/java/org/apache/cassandra/tools/BulkLoader.java: src/java/org/apache/cassandra/tools/GetVersion.java: src/java/org/apache/cassandra/tools/LoaderOptions.java: src/java/org/apache/cassandra/tools/NodeProbe.java: src/java/org/apache/cassandra/tools/Output.java: src/java/org/apache/cassandra/tools/SSTableExpiredBlockers.java: src/java/org/apache/cassandra/tools/SSTableExport.java: src/java/org/apache/cassandra/tools/SSTableLevelResetter.java: src/java/org/apache/cassandra/tools/SSTableMetadataViewer.java: src/java/org/apache/cassandra/tools/SSTableOfflineRelevel.java: src/java/org/apache/cassandra/tools/SSTableRepairedAtSetter.java: src/java/org/apache/cassandra/tools/StandaloneSSTableUtil.java: src/java/org/apache/cassandra/tools/StandaloneScrubber.java: src/java/org/apache/cassandra/tools/StandaloneSplitter.java: src/java/org/apache/cassandra/tools/StandaloneUpgrader.java: src/java/org/apache/cassandra/tools/StandaloneVerifier.java: src/java/org/apache/cassandra/tools/Util.java: src/java/org/apache/cassandra/tools/nodetool/formatter/TableBuilder.java: 08:15:16 in cassandra on b/CASSANDRA-16057-3.0 ➜ egrep -r 'System.out|System.err' src/java/org/apache/cassandra/tools | awk {'print $1'} | sort | uniq src/java/org/apache/cassandra/tools/AbstractJmxClient.java: src/java/org/apache/cassandra/tools/BulkLoader.java: src/java/org/apache/cassandra/tools/GetVersion.java: src/java/org/apache/cassandra/tools/NodeProbe.java: src/java/org/apache/cassandra/tools/Output.java: src/java/org/apache/cassandra/tools/SSTableExpiredBlockers.java: src/java/org/apache/cassandra/tools/SSTableExport.java: src/java/org/apache/cassandra/tools/SSTableLevelResetter.java: src/java/org/apache/cassandra/tools/SSTableMetadataViewer.java: src/java/org/apache/cassandra/tools/SSTableOfflineRelevel.java: src/java/org/apache/cassandra/tools/SSTableRepairedAtSetter.java: src/java/org/apache/cassandra/tools/StandaloneSSTableUtil.java: src/java/org/apache/cassandra/tools/StandaloneScrubber.java: src/java/org/apache/cassandra/tools/StandaloneSplitter.java: src/java/org/apache/cassandra/tools/StandaloneUpgrader.java: src/java/org/apache/cassandra/tools/StandaloneVerifier.java: src/java/org/apache/cassandra/tools/Util.java: 08:15:26 in cassandra on b/CASSANDRA-16057-2.2 ➜ egrep -r 'System.out|System.err' src/java/org/apache/cassandra/tools | awk {'print $1'} | sort | uniq src/java/org/apache/cassandra/tools/AbstractJmxClient.java: src/java/org/apache/cassandra/tools/BulkLoader.java: src/java/org/apache/cassandra/tools/GetVersion.java: src/java/org/apache/cassandra/tools/NodeProbe.java: src/java/org/apache/cassandra/tools/Output.java: src/java/org/apache/cassandra/tools/SSTableExpiredBlockers.java: src/java/org/apache/cassandra/tools/SSTableExport.java: src/java/org/apache/cassandra/tools/SSTableImport.java: src/java/org/apache/cassandra/tools/SSTableLevelResetter.java: src/java/org/apache/cassandra/tools/SSTableMetadataViewer.java: src/java/org/apache/cassandra/tools/SSTableOfflineRelevel.java: src/java/org/apache/cassandra/tools/SSTableRepairedAtSetter.java: src/java/org/apache/cassandra/tools/StandaloneScrubber.java: src/java/org/apache/cassandra/tools/StandaloneSplitter.java: src/java/org/apache/cassandra/tools/StandaloneUpgrader.java: src/java/org/apache/cassandra/tools/StandaloneVerifier.java: src/java/org/apache/cassandra/tools/Util.java: {code} > Should update in-jvm dtest to expose stdout and stderr for nodetool > --- > > Key: CASSANDRA-16057 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16057 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/java >Reporter: David Capwell >Assignee: Yifan Cai >Priority: Normal > Fix For: NA > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Many nodetool commands output to stdout or stderr so running nodetool using > in-jvm dtest should expose that to tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch trunk updated (6098762 -> 3a05ed3)
This is an automated email from the ASF dual-hosted git repository. ifesdjeen pushed a change to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git. from 6098762 Fail truncation requests when they fail on a replica add 3a05ed3 Follow-up: fix test failures caused by 16207. No new revisions were added by this update. Summary of changes: .../apache/cassandra/distributed/impl/Instance.java | 21 + 1 file changed, 13 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16057) Should update in-jvm dtest to expose stdout and stderr for nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213989#comment-17213989 ] Alex Petrov commented on CASSANDRA-16057: - Code looks good. The only thing is that in 3.11 we still use {{System.out}} in {{ViewBuildStatus.java}} and {{Info.java}}. > Should update in-jvm dtest to expose stdout and stderr for nodetool > --- > > Key: CASSANDRA-16057 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16057 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/java >Reporter: David Capwell >Assignee: Yifan Cai >Priority: Normal > Fix For: NA > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Many nodetool commands output to stdout or stderr so running nodetool using > in-jvm dtest should expose that to tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16207) NPE when calling broadcast address on unintialized node
[ https://issues.apache.org/jira/browse/CASSANDRA-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213979#comment-17213979 ] Marcus Eriksson commented on CASSANDRA-16207: - +1 > NPE when calling broadcast address on unintialized node > --- > > Key: CASSANDRA-16207 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16207 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Fix For: 3.0.23, 3.11.9, 4.0-beta3 > > > When trying to run upgrades, sometimes we’re calling broadcasts addrerss on > an uninitialised new node: > {code} > java.lang.IllegalStateException: Can't use shut down instances, delegate is > null > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163) > at > org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53) > > at > org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278) > > at > org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) > ~[dtest-3.0.19.jar:?] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213) > > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182) > > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93) > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16208) Fail truncation requests when they fail on replica
[ https://issues.apache.org/jira/browse/CASSANDRA-16208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-16208: - Since Version: NA Source Control Link: https://github.com/apache/cassandra/commit/609876275738589fdfb9a3e20cb2f594aa404037 Resolution: Fixed Status: Resolved (was: Ready to Commit) Committed, thanks! > Fail truncation requests when they fail on replica > -- > > Key: CASSANDRA-16208 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16208 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0-beta3 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16208) Fail truncation requests when they fail on replica
[ https://issues.apache.org/jira/browse/CASSANDRA-16208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-16208: - Status: Ready to Commit (was: Review In Progress) > Fail truncation requests when they fail on replica > -- > > Key: CASSANDRA-16208 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16208 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0-beta3 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra-dtest] branch master updated: Add test_truncate_failure
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/cassandra-dtest.git The following commit(s) were added to refs/heads/master by this push: new 8cb6bd2 Add test_truncate_failure 8cb6bd2 is described below commit 8cb6bd23e62c4d3b4e208d3909361d6812182bc6 Author: Ekaterina Dimitrova AuthorDate: Thu Oct 8 09:23:00 2020 -0400 Add test_truncate_failure Patch by Ekaterina Dimitrova, reviewed by brandonwilliams for CASSANDRA-16208 --- byteman/truncate_fail.btm | 8 cql_test.py | 33 + 2 files changed, 41 insertions(+) diff --git a/byteman/truncate_fail.btm b/byteman/truncate_fail.btm new file mode 100644 index 000..fa9caba --- /dev/null +++ b/byteman/truncate_fail.btm @@ -0,0 +1,8 @@ +RULE Throw during truncate operation +CLASS org.apache.cassandra.db.ColumnFamilyStore +METHOD truncateBlocking() +AT ENTRY +IF TRUE +DO + throw new RuntimeException("Dummy failure"); +ENDRULE \ No newline at end of file diff --git a/cql_test.py b/cql_test.py index eced21d..dde7b7d 100644 --- a/cql_test.py +++ b/cql_test.py @@ -1,4 +1,5 @@ import itertools +import re import struct import time import pytest @@ -764,6 +765,38 @@ class TestMiscellaneousCQL(CQLTester): [2, None, 2, None], [3, None, 3, None]]) +@since("4.0") +def test_truncate_failure(self): +""" +@jira_ticket CASSANDRA-16208 +Tests that if a TRUNCATE query fails on some replica, the coordinator will immediately return an error to the +client instead of waiting to time out because it couldn't get the necessary number of success acks. +""" +cluster = self.cluster +cluster.populate(3, install_byteman=True).start() +node1, _, node3 = cluster.nodelist() +node3.byteman_submit(['./byteman/truncate_fail.btm']) + +session = self.patient_exclusive_cql_connection(node1) +create_ks(session, 'ks', 3) + +logger.debug("Creating data table") +session.execute("CREATE TABLE data (id int PRIMARY KEY, data text)") +session.execute("UPDATE data SET data = 'Awesome' WHERE id = 1") + +self.fixture_dtest_setup.ignore_log_patterns = ['Dummy failure'] +logger.debug("Truncating data table (error expected)") + +thrown = False +exception = None +try: +session.execute("TRUNCATE data") +except Exception as e: +exception = e +thrown = True + +assert thrown, "No exception has been thrown" +assert re.search("Truncate failed on replica /127.0.0.3", str(exception)) is not None @since('3.2') class AbortedQueryTester(CQLTester): - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch trunk updated: Fail truncation requests when they fail on a replica
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/trunk by this push: new 6098762 Fail truncation requests when they fail on a replica 6098762 is described below commit 609876275738589fdfb9a3e20cb2f594aa404037 Author: Ekaterina Dimitrova AuthorDate: Mon Oct 12 18:11:51 2020 -0400 Fail truncation requests when they fail on a replica Patch by Ekaterina Dimitrova, reviewed by brandonwilliams for CASSANDRA-16208 --- CHANGES.txt| 1 + .../apache/cassandra/db/TruncateVerbHandler.java | 24 +-- .../cassandra/service/TruncateResponseHandler.java | 27 ++ 3 files changed, 35 insertions(+), 17 deletions(-) diff --git a/CHANGES.txt b/CHANGES.txt index a7701c7..fe3fef8 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0-beta3 + * Fail truncation requests when they fail on a replica (CASSANDRA-16208) * Move compact storage validation earlier in startup process (CASSANDRA-16063) * Fix ByteBufferAccessor cast exceptions are thrown when trying to query a virtual table (CASSANDRA-16155) * Consolidate node liveness check for forced repair (CASSANDRA-16113) diff --git a/src/java/org/apache/cassandra/db/TruncateVerbHandler.java b/src/java/org/apache/cassandra/db/TruncateVerbHandler.java index c605d1f..0d71464 100644 --- a/src/java/org/apache/cassandra/db/TruncateVerbHandler.java +++ b/src/java/org/apache/cassandra/db/TruncateVerbHandler.java @@ -34,31 +34,31 @@ public class TruncateVerbHandler implements IVerbHandler public void doVerb(Message message) { -TruncateRequest t = message.payload; -Tracing.trace("Applying truncation of {}.{}", t.keyspace, t.table); +TruncateRequest truncation = message.payload; +Tracing.trace("Applying truncation of {}.{}", truncation.keyspace, truncation.table); try { -ColumnFamilyStore cfs = Keyspace.open(t.keyspace).getColumnFamilyStore(t.table); +ColumnFamilyStore cfs = Keyspace.open(truncation.keyspace).getColumnFamilyStore(truncation.table); cfs.truncateBlocking(); } -catch (Exception e) +catch (Throwable throwable) { -logger.error("Error in truncation", e); -respondError(t, message); +logger.error("Error in truncation", throwable); +respondError(truncation, message); -if (FSError.findNested(e) != null) -throw FSError.findNested(e); +if (FSError.findNested(throwable) != null) +throw FSError.findNested(throwable); } Tracing.trace("Enqueuing response to truncate operation to {}", message.from()); -TruncateResponse response = new TruncateResponse(t.keyspace, t.table, true); -logger.trace("{} applied. Enqueuing response to {}@{} ", t, message.id(), message.from()); +TruncateResponse response = new TruncateResponse(truncation.keyspace, truncation.table, true); +logger.trace("{} applied. Enqueuing response to {}@{} ", truncation, message.id(), message.from()); MessagingService.instance().send(message.responseWith(response), message.from()); } -private static void respondError(TruncateRequest t, Message truncateRequestMessage) +private static void respondError(TruncateRequest truncation, Message truncateRequestMessage) { -TruncateResponse response = new TruncateResponse(t.keyspace, t.table, false); +TruncateResponse response = new TruncateResponse(truncation.keyspace, truncation.table, false); MessagingService.instance().send(truncateRequestMessage.responseWith(response), truncateRequestMessage.from()); } } diff --git a/src/java/org/apache/cassandra/service/TruncateResponseHandler.java b/src/java/org/apache/cassandra/service/TruncateResponseHandler.java index bcd7426..c2651e6 100644 --- a/src/java/org/apache/cassandra/service/TruncateResponseHandler.java +++ b/src/java/org/apache/cassandra/service/TruncateResponseHandler.java @@ -17,6 +17,7 @@ */ package org.apache.cassandra.service; +import java.net.InetAddress; import java.util.concurrent.TimeoutException; import java.util.concurrent.atomic.AtomicInteger; @@ -24,19 +25,22 @@ import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.apache.cassandra.config.DatabaseDescriptor; +import org.apache.cassandra.db.TruncateResponse; +import org.apache.cassandra.exceptions.TruncateException; import org.apache.cassandra.net.RequestCallback; import org.apache.cassandra.net.Message; import org.apache.cassandra.utils.concurrent.SimpleCondition; import static java.util.concurrent.TimeUnit.NANOSECONDS; -public class TruncateResponseHandler implements RequestCallback
[jira] [Comment Edited] (CASSANDRA-16207) NPE when calling broadcast address on unintialized node
[ https://issues.apache.org/jira/browse/CASSANDRA-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213973#comment-17213973 ] Alex Petrov edited comment on CASSANDRA-16207 at 10/14/20, 2:52 PM: This patch caused several test failures. Follow-up/fix: |[patch|https://github.com/apache/cassandra/pull/777]|[CI|https://app.circleci.com/pipelines/github/ifesdjeen/cassandra?branch=CASSANDRA-16207-followup]| was (Author: ifesdjeen): This patch caused several test failures. |[patch|https://github.com/apache/cassandra/pull/777]|[CI|https://app.circleci.com/pipelines/github/ifesdjeen/cassandra?branch=CASSANDRA-16207-followup]| > NPE when calling broadcast address on unintialized node > --- > > Key: CASSANDRA-16207 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16207 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Fix For: 3.0.23, 3.11.9, 4.0-beta3 > > > When trying to run upgrades, sometimes we’re calling broadcasts addrerss on > an uninitialised new node: > {code} > java.lang.IllegalStateException: Can't use shut down instances, delegate is > null > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163) > at > org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53) > > at > org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278) > > at > org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) > ~[dtest-3.0.19.jar:?] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213) > > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182) > > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93) > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16207) NPE when calling broadcast address on unintialized node
[ https://issues.apache.org/jira/browse/CASSANDRA-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213973#comment-17213973 ] Alex Petrov commented on CASSANDRA-16207: - This patch caused several test failures. |[patch|https://github.com/apache/cassandra/pull/777]|[CI|https://app.circleci.com/pipelines/github/ifesdjeen/cassandra?branch=CASSANDRA-16207-followup]| > NPE when calling broadcast address on unintialized node > --- > > Key: CASSANDRA-16207 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16207 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Fix For: 3.0.23, 3.11.9, 4.0-beta3 > > > When trying to run upgrades, sometimes we’re calling broadcasts addrerss on > an uninitialised new node: > {code} > java.lang.IllegalStateException: Can't use shut down instances, delegate is > null > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163) > at > org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53) > > at > org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278) > > at > org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) > ~[dtest-3.0.19.jar:?] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213) > > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182) > > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93) > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16207) NPE when calling broadcast address on unintialized node
[ https://issues.apache.org/jira/browse/CASSANDRA-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov updated CASSANDRA-16207: Status: Open (was: Resolved) > NPE when calling broadcast address on unintialized node > --- > > Key: CASSANDRA-16207 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16207 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Fix For: 3.0.23, 3.11.9, 4.0-beta3 > > > When trying to run upgrades, sometimes we’re calling broadcasts addrerss on > an uninitialised new node: > {code} > java.lang.IllegalStateException: Can't use shut down instances, delegate is > null > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163) > at > org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53) > > at > org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278) > > at > org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) > ~[dtest-3.0.19.jar:?] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213) > > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182) > > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93) > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16209) Log Warning Rather than Verbose Trace when Preview Repair Validation Conflicts with Incremental Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-16209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213480#comment-17213480 ] Caleb Rackliffe edited comment on CASSANDRA-16209 at 10/14/20, 2:38 PM: [patch|https://github.com/apache/cassandra/pull/776] [CircleCI|https://app.circleci.com/pipelines/github/maedhroz/cassandra?branch=CASSANDRA-16209] Note: The failures in the first round of tests look mostly related to CASSANDRA-16148 was (Author: maedhroz): [patch|https://github.com/apache/cassandra/pull/776] [CircleCI|https://app.circleci.com/pipelines/github/maedhroz/cassandra?branch=CASSANDRA-16209] > Log Warning Rather than Verbose Trace when Preview Repair Validation > Conflicts with Incremental Repair > -- > > Key: CASSANDRA-16209 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16209 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 4.0-beta > > Time Spent: 0.5h > Remaining Estimate: 0h > > When a preview repair on repaired data identifies which SSTables to validate, > it might come across an SSTable that's still pending for an in-progress > incremental repair session. It makes sense that we immediately fail the > preview repair in that case, but the resulting error and verbose stack trace > in the logs is a bit too severe a reaction. We should downgrade this to a > simple warning message. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear
[ https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-16210: Description: Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one random failure was reported which pointed to a race condition to be spotted. (was: Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one failure there was reported which pointed to a race condition to be spotted. ) > Synchronize Keyspace instance store/clear > - > > Key: CASSANDRA-16210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16210 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.x, 4.0-beta3 > > > Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one > random failure was reported which pointed to a race condition to be spotted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear
[ https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-16210: Fix Version/s: 4.0-beta3 3.11.x > Synchronize Keyspace instance store/clear > - > > Key: CASSANDRA-16210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16210 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.x, 4.0-beta3 > > > Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one > failure there was reported which pointed to a race condition to be spotted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear
[ https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-16210: Description: Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one failure there was reported which pointed to a race condition to be spotted. > Synchronize Keyspace instance store/clear > - > > Key: CASSANDRA-16210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16210 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > > Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one > failure there was reported which pointed to a race condition to be spotted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear
[ https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-16210: Bug Category: Parent values: Correctness(12982) Complexity: Normal Component/s: Cluster/Schema Discovered By: Unit Test Severity: Normal Status: Open (was: Triage Needed) > Synchronize Keyspace instance store/clear > - > > Key: CASSANDRA-16210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16210 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16210) Synchronize Keyspace instance store/clear
Ekaterina Dimitrova created CASSANDRA-16210: --- Summary: Synchronize Keyspace instance store/clear Key: CASSANDRA-16210 URL: https://issues.apache.org/jira/browse/CASSANDRA-16210 Project: Cassandra Issue Type: Bug Reporter: Ekaterina Dimitrova Assignee: Ekaterina Dimitrova -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16197) Upgrade the metrics version
[ https://issues.apache.org/jira/browse/CASSANDRA-16197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213941#comment-17213941 ] Benjamin Lerer commented on CASSANDRA-16197: It is probably easier and cleaner to open a new one at that time if we have the need for it. > Upgrade the metrics version > --- > > Key: CASSANDRA-16197 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16197 > Project: Cassandra > Issue Type: Improvement > Components: Dependencies >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer >Priority: Normal > > The current metrics version used by Cassandra is 3.1.5 which was not compiled > and targeted for the JDK 8 > (https://metrics.dropwizard.io/4.1.2/about/release-notes.html). > There are several bug fixes that would also be interesting to get. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15581) 4.0 quality testing: Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213926#comment-17213926 ] Marcus Eriksson commented on CASSANDRA-15581: - I think this task should focus mostly on the mechanics of picking sstables for compaction, not the actual merging of sstables (though, that will of course also be tested by anything we do here). What [~paulo] defined above would be a good start Major compaction changes were CASSANDRA-6696 and CASSANDRA-7019 * Run all tests with different amounts of data directories (1/5/20) * Run all tests with different compaction strategies (LCS/STCS/TWCS) * Run LCS tests with {{single_sstable_uplevel}} on/off - CASSANDRA-12526 * Bootstrap/decom/replace, make sure disk usage is balanced on new + old nodes * Heavy compaction load + range movements * Heavy compaction load + ALTER .. WITH compaction = .. * Heavy compaction load + incremental repair / anticompaction * Test large node upgrades with several data directories (3.0 -> 4.0 probably most interesting here) * Test `nodetool garbagecollect` with large datasets and many tombstones. > 4.0 quality testing: Compaction > --- > > Key: CASSANDRA-15581 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15581 > Project: Cassandra > Issue Type: Task > Components: Test/dtest/python >Reporter: Josh McKenzie >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Marcus Eriksson* > Alongside the local and distributed read/write paths, we'll also want to > validate compaction. CASSANDRA-6696 introduced substantial > changes/improvements that require testing (esp. JBOD). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15241) Virtual table to expose current running queries
[ https://issues.apache.org/jira/browse/CASSANDRA-15241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213919#comment-17213919 ] Josh McKenzie commented on CASSANDRA-15241: --- [~clohfink] - It's a new feature so we wouldn't put it in 4.0 right? I don't *think* this is one of the ones we discussed on the ML/slack about straddling the freeze (inferring from dates on the ticket here). Feel free to correct me if I'm wrong on that though; we've been talking a lot lately. > Virtual table to expose current running queries > --- > > Key: CASSANDRA-15241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15241 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Virtual Tables >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Normal > Fix For: 4.0 > > > Expose current running queries and their duration. > {code}cqlsh> select * from system_views.queries; > thread_id| duration_micros | task > --+-+- > Native-Transport-Requests-17 |6325 | QUERY > select * from system_views.queries; [pageSize = 100] > Native-Transport-Requests-4 | 14681 | EXECUTE > f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE > Native-Transport-Requests-6 | 14678 | EXECUTE > f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE > ReadStage-10 | 16535 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-13 | 16535 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-14 | 16535 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-19 | 11861 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-20 | 11861 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-22 |7279 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-23 |4716 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-5 | 16535 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-7 | 16535 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-8 | 16535 | >SELECT * FROM basic.wide1 LIMIT 5000{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16151) Package tools/bin scripts as executable
[ https://issues.apache.org/jira/browse/CASSANDRA-16151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-16151: Bug Category: Parent values: Packaging(13660)Level 1 values: Source Distribution(13661) (was: Parent values: Code(13163)Level 1 values: Bug - Unclear Impact(13164)) > Package tools/bin scripts as executable > --- > > Key: CASSANDRA-16151 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16151 > Project: Cassandra > Issue Type: Bug > Components: Packaging >Reporter: Angelo Polo >Assignee: Angelo Polo >Priority: Normal > Labels: patch > Fix For: 4.0-beta, 3.11.9 > > Attachments: 3.11-Package-tools-bin-scripts-as-executable.patch, > trunk-Package-tools-bin-scripts-as-executable.patch > > > The tools/bin scripts aren't packaged as executable in the source > distributions, though in the repository the scripts have the right bits. > This causes, on 3.11.8 for example, the tests in > org.apache.cassandra.cql3.EmptyValuesTest to fail: > {{java.io.IOException: Cannot run program "tools/bin/sstabledump": error=13, > Permission denied}} > {{[junit-timeout] junit.framework.AssertionFailedError: java.io.IOException}} > {{[junit-timeout] at > org.apache.cassandra.cql3.EmptyValuesTest.verify(EmptyValuesTest.java:85)}} > {{[junit-timeout] at > org.apache.cassandra.cql3.EmptyValuesTest.verifyJsonInsert(EmptyValuesTest.java:112)}} > See attached patch of build.xml for the trunk and cassandra-3.11 branches. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15935) Improve machinery for testing consistency in presence of range movements
[ https://issues.apache.org/jira/browse/CASSANDRA-15935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213879#comment-17213879 ] Aleksey Yeschenko commented on CASSANDRA-15935: --- The difference isn't huge, and I myself don't have a *strong* preference either, but my weak preference goes to the more Java-y, {{Action}} route. > Improve machinery for testing consistency in presence of range movements > > > Key: CASSANDRA-15935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15935 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/java >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > > Currently, we can test range movements only by adding and bootstrapping a new > node. This is both inefficient and insufficient for large-scale tests. We > need a possibility to dynamically change ring ownership over the lifetime of > cluster, with a flexibility to changing gossip status of the node from > perspective of other participants, adding and removing nodes from other > nodes' views on demand. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15935) Improve machinery for testing consistency in presence of range movements
[ https://issues.apache.org/jira/browse/CASSANDRA-15935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213799#comment-17213799 ] Alex Petrov edited comment on CASSANDRA-15935 at 10/14/20, 10:17 AM: - Moving a [conversation|https://github.com/apache/cassandra/pull/759#discussion_r503611334] about {{Action}} vs static method here. bq. in your examples there are no real differences between run and forEach, so I rather have forEach only. You're right there are no real differences between {{run}} and {{forEach}}. However, I had several reasons to use interface implementaions, which are: # {{Action}} is an atomic unit of logic, unlike a static method. You can immediately see all things related to a specific action, reuse, and move then at your discression. Using static methods will quickly get out of hand when we have more sophisticated actions. # Separation of input arguments (for example "disseminate gossip state of the node X") and `target`, making `target` explicit and common for all cases. In some cases, we can even reduce amount of work we're doing, and do it once in a constructor. For example, [here|https://github.com/ifesdjeen/cassandra/blob/96eab42347f51bd32a22875b85f4acf6cd9785d4/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L136-L144], we create gossip state that gets disseminated by getting applied to each action. Contrast this with [this|https://github.com/dcapwell/cassandra/blob/935f78b101484f6dfee473fd5375a31761f02b39/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L122-L143], where we either have to change to use instance collection as input, or re-create distributed state each time. # Main idea behind the `Action` was to create reusable pieces of logic you can apply to cluster or nodes. For now, logic is simple, like "pull schema fom X, and then bootstrap", but we can reuse similar sequences of steps in: * (a) Harry, where we can schedule different actions against different sets of nodes, producing reliable results * (b) In upgrade test, where you'll be able to run named actions aginst instances despite the fact they have different versions. To do (a) and (b) with static methods, we'll have to _still_ implement some interface. # We can use static code analysis to find all Actions in the code. # We can chain actions, too: {code} cluster.run(asList(pullSchemaFrom(cluster.get(1)), bootstrap()), newInstance.config().num()); {code} I've used {{Action}} from the beginning with these intention. Everyone I asked has no strong preference towards on or the other, and it's same with me: aside from the above arguments, difference is purely syntactic. Both approaches have equivalent semantics. was (Author: ifesdjeen): Moving a [conversation|https://github.com/apache/cassandra/pull/759#discussion_r503611334] about {{Action}} vs static method here. bq. in your examples there are no real differences between run and forEach, so I rather have forEach only. You're right there are no real differences between {{run}} and {{forEach}}. However, I had several reasons to use interface implementaions, which are: 1. {{Action}} is an atomic unit of logic, unlike a static method. You can immediately see all things related to a specific action, reuse, and move then at your discression. Using static methods will quickly get out of hand when we have more sophisticated actions. 2. Separation of input arguments (for example "disseminate gossip state of the node X") and `target`, making `target` explicit and common for all cases. In some cases, we can even reduce amount of work we're doing, and do it once in a constructor. For example, [here|https://github.com/ifesdjeen/cassandra/blob/96eab42347f51bd32a22875b85f4acf6cd9785d4/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L136-L144], we create gossip state that gets disseminated by getting applied to each action. Contrast this with [this|https://github.com/dcapwell/cassandra/blob/935f78b101484f6dfee473fd5375a31761f02b39/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L122-L143], where we either have to change to use instance collection as input, or re-create distributed state each time. 3. Main idea behind the `Action` was to create reusable pieces of logic you can apply to cluster or nodes. For now, logic is simple, like "pull schema fom X, and then bootstrap", but we can reuse similar sequences of steps in: a. Harry, where we can schedule different actions against different sets of nodes, producing reliable results b. In upgrade test, where you'll be able to run named actions aginst instances despite the fact they have different versions. To do (a) and (b) with static methods, we'll have to _still_ implement some
[jira] [Updated] (CASSANDRA-16151) Package tools/bin scripts as executable
[ https://issues.apache.org/jira/browse/CASSANDRA-16151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andres de la Peña updated CASSANDRA-16151: -- Reviewers: Andres de la Peña > Package tools/bin scripts as executable > --- > > Key: CASSANDRA-16151 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16151 > Project: Cassandra > Issue Type: Bug > Components: Packaging >Reporter: Angelo Polo >Assignee: Angelo Polo >Priority: Normal > Labels: patch > Fix For: 4.0-beta, 3.11.9 > > Attachments: 3.11-Package-tools-bin-scripts-as-executable.patch, > trunk-Package-tools-bin-scripts-as-executable.patch > > > The tools/bin scripts aren't packaged as executable in the source > distributions, though in the repository the scripts have the right bits. > This causes, on 3.11.8 for example, the tests in > org.apache.cassandra.cql3.EmptyValuesTest to fail: > {{java.io.IOException: Cannot run program "tools/bin/sstabledump": error=13, > Permission denied}} > {{[junit-timeout] junit.framework.AssertionFailedError: java.io.IOException}} > {{[junit-timeout] at > org.apache.cassandra.cql3.EmptyValuesTest.verify(EmptyValuesTest.java:85)}} > {{[junit-timeout] at > org.apache.cassandra.cql3.EmptyValuesTest.verifyJsonInsert(EmptyValuesTest.java:112)}} > See attached patch of build.xml for the trunk and cassandra-3.11 branches. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15935) Improve machinery for testing consistency in presence of range movements
[ https://issues.apache.org/jira/browse/CASSANDRA-15935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213799#comment-17213799 ] Alex Petrov edited comment on CASSANDRA-15935 at 10/14/20, 10:16 AM: - Moving a [conversation|https://github.com/apache/cassandra/pull/759#discussion_r503611334] about {{Action}} vs static method here. bq. in your examples there are no real differences between run and forEach, so I rather have forEach only. You're right there are no real differences between {{run}} and {{forEach}}. However, I had several reasons to use interface implementaions, which are: 1. {{Action}} is an atomic unit of logic, unlike a static method. You can immediately see all things related to a specific action, reuse, and move then at your discression. Using static methods will quickly get out of hand when we have more sophisticated actions. 2. Separation of input arguments (for example "disseminate gossip state of the node X") and `target`, making `target` explicit and common for all cases. In some cases, we can even reduce amount of work we're doing, and do it once in a constructor. For example, [here|https://github.com/ifesdjeen/cassandra/blob/96eab42347f51bd32a22875b85f4acf6cd9785d4/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L136-L144], we create gossip state that gets disseminated by getting applied to each action. Contrast this with [this|https://github.com/dcapwell/cassandra/blob/935f78b101484f6dfee473fd5375a31761f02b39/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L122-L143], where we either have to change to use instance collection as input, or re-create distributed state each time. 3. Main idea behind the `Action` was to create reusable pieces of logic you can apply to cluster or nodes. For now, logic is simple, like "pull schema fom X, and then bootstrap", but we can reuse similar sequences of steps in: a. Harry, where we can schedule different actions against different sets of nodes, producing reliable results b. In upgrade test, where you'll be able to run named actions aginst instances despite the fact they have different versions. To do (a) and (b) with static methods, we'll have to _still_ implement some interface. 4. We can use static code analysis to find all Actions in the code. 5. We can chain actions, too: {code} cluster.run(asList(pullSchemaFrom(cluster.get(1)), bootstrap()), newInstance.config().num()); {code} I've used {{Action}} from the beginning with these intention. Everyone I asked has no strong preference towards on or the other, and it's same with me: aside from the above arguments, difference is purely syntactic. Both approaches have equivalent semantics. was (Author: ifesdjeen): Moving a [conversation|https://github.com/apache/cassandra/pull/759#discussion_r503611334] about {{Action}} vs static method here. bq. in your examples there are no real differences between run and forEach, so I rather have forEach only. You're right there are no real differences between {{run}} and {{forEach}}. However, I had several reasons to use interface implementaions, which are: 1. {{Action}} is an atomic unit of logic, unlike a static method. You can immediately see all things related to a specific action, reuse, and move then at your discression. Using static methods will quickly get out of hand when we have more sophisticated actions. 2. Separation of input arguments (for example "disseminate gossip state of the node X") and `target`, making `target` explicit and common for all cases. 3. Main idea behind the `Action` was to create reusable pieces of logic you can apply to cluster or nodes. For now, logic is simple, like "pull schema fom X, and then bootstrap", but we can reuse similar sequences of steps in: a. Harry, where we can schedule different actions against different sets of nodes, producing reliable results b. In upgrade test, where you'll be able to run named actions aginst instances despite the fact they have different versions. To do (a) and (b) with static methods, we'll have to _still_ implement some interface. 4. In some cases, we can just reduce amount of work we're doing. For example, [here|https://github.com/ifesdjeen/cassandra/blob/96eab42347f51bd32a22875b85f4acf6cd9785d4/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L136-L144], we create gossip state that gets disseminated by getting applied to each action. Contrast this with [this|https://github.com/dcapwell/cassandra/blob/935f78b101484f6dfee473fd5375a31761f02b39/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L122-L143], where we either have to change to use instance collection as input, or re-create distributed state each time. 5. We can use static code
[jira] [Commented] (CASSANDRA-15935) Improve machinery for testing consistency in presence of range movements
[ https://issues.apache.org/jira/browse/CASSANDRA-15935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213799#comment-17213799 ] Alex Petrov commented on CASSANDRA-15935: - Moving a [conversation|https://github.com/apache/cassandra/pull/759#discussion_r503611334] about {{Action}} vs static method here. bq. in your examples there are no real differences between run and forEach, so I rather have forEach only. You're right there are no real differences between {{run}} and {{forEach}}. However, I had several reasons to use interface implementaions, which are: 1. {{Action}} is an atomic unit of logic, unlike a static method. You can immediately see all things related to a specific action, reuse, and move then at your discression. Using static methods will quickly get out of hand when we have more sophisticated actions. 2. Separation of input arguments (for example "disseminate gossip state of the node X") and `target`, making `target` explicit and common for all cases. 3. Main idea behind the `Action` was to create reusable pieces of logic you can apply to cluster or nodes. For now, logic is simple, like "pull schema fom X, and then bootstrap", but we can reuse similar sequences of steps in: a. Harry, where we can schedule different actions against different sets of nodes, producing reliable results b. In upgrade test, where you'll be able to run named actions aginst instances despite the fact they have different versions. To do (a) and (b) with static methods, we'll have to _still_ implement some interface. 4. In some cases, we can just reduce amount of work we're doing. For example, [here|https://github.com/ifesdjeen/cassandra/blob/96eab42347f51bd32a22875b85f4acf6cd9785d4/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L136-L144], we create gossip state that gets disseminated by getting applied to each action. Contrast this with [this|https://github.com/dcapwell/cassandra/blob/935f78b101484f6dfee473fd5375a31761f02b39/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L122-L143], where we either have to change to use instance collection as input, or re-create distributed state each time. 5. We can use static code analysis to find all Actions in the code. 6. We can chain actions, too: {code} cluster.run(asList(pullSchemaFrom(cluster.get(1)), bootstrap()), newInstance.config().num()); {code} I've used {{Action}} from the beginning with these intention. Everyone I asked has no strong preference towards on or the other, and it's same with me: aside from the above arguments, difference is purely syntactic. Both approaches have equivalent semantics. > Improve machinery for testing consistency in presence of range movements > > > Key: CASSANDRA-15935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15935 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/java >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > > Currently, we can test range movements only by adding and bootstrapping a new > node. This is both inefficient and insufficient for large-scale tests. We > need a possibility to dynamically change ring ownership over the lifetime of > cluster, with a flexibility to changing gossip status of the node from > perspective of other participants, adding and removing nodes from other > nodes' views on demand. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16196) Fix flaky test test_disk_balance_after_boundary_change_lcs - disk_balance_test.TestDiskBalance
[ https://issues.apache.org/jira/browse/CASSANDRA-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213787#comment-17213787 ] Berenguer Blasi commented on CASSANDRA-16196: - SGTM and there's no byteman either I can think of to catch pending deletes... :shrug: > Fix flaky test test_disk_balance_after_boundary_change_lcs - > disk_balance_test.TestDiskBalance > -- > > Key: CASSANDRA-16196 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16196 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: David Capwell >Assignee: Adam Holmberg >Priority: Normal > Fix For: 4.0-beta > > Attachments: node2-debug-end.log > > > https://app.circleci.com/pipelines/github/dcapwell/cassandra/622/workflows/adcd463c-156a-43c7-a9bc-7f3e4938dbe8/jobs/3514 > {code} > error_message = '' if 'error_message' not in kwargs else > kwargs['error_message'] > assert vmin > vmax * (1.0 - error) or vmin == vmax, \ > > "values not within {:.2f}% of the max: {} ({})".format(error * > > 100, args, error_message) > E AssertionError: values not within 10.00% of the max: (8022760, > 9192165, 4575645, 9235566, 9091014) (node2) > tools/assertions.py:206: AssertionError > {code} > Marking as distinct issue after chat in CASSANDRA-14030 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16196) Fix flaky test test_disk_balance_after_boundary_change_lcs - disk_balance_test.TestDiskBalance
[ https://issues.apache.org/jira/browse/CASSANDRA-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Berenguer Blasi updated CASSANDRA-16196: Reviewers: Berenguer Blasi, Brandon Williams (was: Brandon Williams) > Fix flaky test test_disk_balance_after_boundary_change_lcs - > disk_balance_test.TestDiskBalance > -- > > Key: CASSANDRA-16196 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16196 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: David Capwell >Assignee: Adam Holmberg >Priority: Normal > Fix For: 4.0-beta > > Attachments: node2-debug-end.log > > > https://app.circleci.com/pipelines/github/dcapwell/cassandra/622/workflows/adcd463c-156a-43c7-a9bc-7f3e4938dbe8/jobs/3514 > {code} > error_message = '' if 'error_message' not in kwargs else > kwargs['error_message'] > assert vmin > vmax * (1.0 - error) or vmin == vmax, \ > > "values not within {:.2f}% of the max: {} ({})".format(error * > > 100, args, error_message) > E AssertionError: values not within 10.00% of the max: (8022760, > 9192165, 4575645, 9235566, 9091014) (node2) > tools/assertions.py:206: AssertionError > {code} > Marking as distinct issue after chat in CASSANDRA-14030 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16201) Reduce amount of allocations during batch statement execution
[ https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-16201: --- Reviewers: Michael Semb Wever > Reduce amount of allocations during batch statement execution > - > > Key: CASSANDRA-16201 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16201 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Thomas Steinmaurer >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Attachments: screenshot-1.png, screenshot-2.png > > > In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, > we see 4.0b2 going OOM from time to time. According to a heap dump, we have > multiple NTR threads in a 3-digit MB range. > This is likely related to object array pre-allocations at the size of > {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always > only 1 {{BTreeRow}} in the {{BTree}}. > !screenshot-1.png|width=100%! > So it seems we have many, many 20K elemnts pre-allocated object arrays > resulting in a shallow heap of 80K each, although there is only one element > in the array. > This sort of pre-allocation is causing a lot of memory pressure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16157) RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov updated CASSANDRA-16157: Fix Version/s: 4.0-beta3 Since Version: 4.0-beta1 Source Control Link: https://github.com/apache/cassandra/commit/5be83b6a72695253c552535d2b826209f144cc63 Resolution: Fixed Status: Resolved (was: Ready to Commit) Committed to trunk with [5be83b6a72695253c552535d2b826209f144cc63|https://github.com/apache/cassandra/commit/5be83b6a72695253c552535d2b826209f144cc63] > RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade > --- > > Key: CASSANDRA-16157 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16157 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Fix For: 4.0-beta3 > > > When trying to upgrade 3.0 to 4.0, we’re often running into a problem, if > older node serves as a coordinator: > {code} > 15294 java.lang.RuntimeException: Can not deserialize message > org.apache.cassandra.distributed.impl.MessageImpl@4c46aead > 15295 at > org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299) > ~[dtest-4.0-beta3.jar:?] > 15296 at > org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315) > ~[dtest-4.0-beta3.jar:?] > 15297 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_232] > 15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_232] > 15299 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[?:1.8.0_232] > 15300 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ~[?:1.8.0_232] > 15301 at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > [dtest-4.0-beta3.jar:?] > 15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232] > 15303 Caused by: java.io.EOFException > 15304 at > org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180) > ~[dtest-4.0-beta3.jar:?] > 15305 at > org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68) > ~[dtest-4.0-beta3.jar:?] > 15306 at > org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243) > ~[dtest-4.0-beta3.jar:?] > 15307 at > org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694) > ~[dtest-4.0-beta3.jar:?] > 15308 at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765) > ~[dtest-4.0-beta3.jar:?] > 15309 at > org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) > ~[dtest-4.0-beta3.jar:?] > 15310 at > org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295) > ~[dtest-4.0-beta3.jar:?] > 15311 ... 7 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch trunk updated: Fix NPEs when 3.0 messages get re-serialized for filtering on 4.0 nodes in in-JVM dtests.
This is an automated email from the ASF dual-hosted git repository. ifesdjeen pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/trunk by this push: new 5be83b6 Fix NPEs when 3.0 messages get re-serialized for filtering on 4.0 nodes in in-JVM dtests. 5be83b6 is described below commit 5be83b6a72695253c552535d2b826209f144cc63 Author: Alex Petrov AuthorDate: Thu Oct 1 17:00:12 2020 +0200 Fix NPEs when 3.0 messages get re-serialized for filtering on 4.0 nodes in in-JVM dtests. Patch by Alex Petrov; reviewed by Yifan Cai and David Capwell for CASSANDRA-16157 --- .../cassandra/distributed/impl/Instance.java | 14 +- .../cassandra/distributed/impl/MessageImpl.java| 13 - .../cassandra/distributed/upgrade/UpgradeTest.java | 22 ++ 3 files changed, 35 insertions(+), 14 deletions(-) diff --git a/test/distributed/org/apache/cassandra/distributed/impl/Instance.java b/test/distributed/org/apache/cassandra/distributed/impl/Instance.java index 6ad0712..47e2b32 100644 --- a/test/distributed/org/apache/cassandra/distributed/impl/Instance.java +++ b/test/distributed/org/apache/cassandra/distributed/impl/Instance.java @@ -83,6 +83,7 @@ import org.apache.cassandra.gms.Gossiper; import org.apache.cassandra.gms.VersionedValue; import org.apache.cassandra.hints.HintsService; import org.apache.cassandra.index.SecondaryIndexManager; +import org.apache.cassandra.io.IVersionedAsymmetricSerializer; import org.apache.cassandra.io.sstable.IndexSummaryManager; import org.apache.cassandra.io.sstable.format.SSTableReader; import org.apache.cassandra.io.util.DataInputBuffer; @@ -91,6 +92,7 @@ import org.apache.cassandra.io.util.FileUtils; import org.apache.cassandra.locator.InetAddressAndPort; import org.apache.cassandra.net.Message; import org.apache.cassandra.net.MessagingService; +import org.apache.cassandra.net.NoPayload; import org.apache.cassandra.net.Verb; import org.apache.cassandra.schema.Schema; import org.apache.cassandra.schema.SchemaConstants; @@ -110,6 +112,7 @@ import org.apache.cassandra.tools.NodeTool; import org.apache.cassandra.tracing.TraceState; import org.apache.cassandra.tracing.Tracing; import org.apache.cassandra.transport.messages.ResultMessage; +import org.apache.cassandra.utils.ByteArrayUtil; import org.apache.cassandra.utils.DiagnosticSnapshotService; import org.apache.cassandra.utils.ExecutorUtils; import org.apache.cassandra.utils.FBUtilities; @@ -285,9 +288,18 @@ public class Instance extends IsolatedExecutor implements IInvokableInstance private static IMessage serializeMessage(InetAddressAndPort from, InetAddressAndPort to, Message messageOut) { +int version = MessagingService.instance().versions.get(to); +if (messageOut.verb().serializer() == ((IVersionedAsymmetricSerializer) NoPayload.serializer) || messageOut.payload == null) +{ +return new MessageImpl(messageOut.verb().id, + ByteArrayUtil.EMPTY_BYTE_ARRAY, + messageOut.id(), + version, + fromCassandraInetAddressAndPort(from)); +} + try (DataOutputBuffer out = new DataOutputBuffer(1024)) { -int version = MessagingService.instance().versions.get(to); Message.serializer.serialize(messageOut, out, version); byte[] bytes = out.toByteArray(); if (messageOut.serializedSize(version) != bytes.length) diff --git a/test/distributed/org/apache/cassandra/distributed/impl/MessageImpl.java b/test/distributed/org/apache/cassandra/distributed/impl/MessageImpl.java index ebc31b1..607e890 100644 --- a/test/distributed/org/apache/cassandra/distributed/impl/MessageImpl.java +++ b/test/distributed/org/apache/cassandra/distributed/impl/MessageImpl.java @@ -21,7 +21,7 @@ package org.apache.cassandra.distributed.impl; import java.net.InetSocketAddress; import org.apache.cassandra.distributed.api.IMessage; -import org.apache.cassandra.distributed.shared.NetworkTopology; +import org.apache.cassandra.utils.ByteArrayUtil; // a container for simplifying the method signature for per-instance message handling/delivery public class MessageImpl implements IMessage @@ -65,5 +65,16 @@ public class MessageImpl implements IMessage { return from; } + +public String toString() +{ +return "MessageImpl{" + + "verb=" + verb + + ", bytes=" + ByteArrayUtil.bytesToHex(bytes) + + ", id=" + id + + ", version=" + version + + ", from=" + from + + '}'; +} } diff --git a/test/distributed/org/apache/cassandra/distributed/upgrade/UpgradeTest.java
[jira] [Commented] (CASSANDRA-16157) RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213752#comment-17213752 ] Alex Petrov commented on CASSANDRA-16157: - [~yifanc] I've added {{toString}} to message. Thank you for reviewing! > RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade > --- > > Key: CASSANDRA-16157 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16157 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > > When trying to upgrade 3.0 to 4.0, we’re often running into a problem, if > older node serves as a coordinator: > {code} > 15294 java.lang.RuntimeException: Can not deserialize message > org.apache.cassandra.distributed.impl.MessageImpl@4c46aead > 15295 at > org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299) > ~[dtest-4.0-beta3.jar:?] > 15296 at > org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315) > ~[dtest-4.0-beta3.jar:?] > 15297 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_232] > 15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_232] > 15299 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[?:1.8.0_232] > 15300 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ~[?:1.8.0_232] > 15301 at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > [dtest-4.0-beta3.jar:?] > 15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232] > 15303 Caused by: java.io.EOFException > 15304 at > org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180) > ~[dtest-4.0-beta3.jar:?] > 15305 at > org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68) > ~[dtest-4.0-beta3.jar:?] > 15306 at > org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243) > ~[dtest-4.0-beta3.jar:?] > 15307 at > org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694) > ~[dtest-4.0-beta3.jar:?] > 15308 at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765) > ~[dtest-4.0-beta3.jar:?] > 15309 at > org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) > ~[dtest-4.0-beta3.jar:?] > 15310 at > org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295) > ~[dtest-4.0-beta3.jar:?] > 15311 ... 7 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15996) Fix flaky python dtest test_expiration_overflow_policy_capnowarn - ttl_test.TestTTL
[ https://issues.apache.org/jira/browse/CASSANDRA-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213740#comment-17213740 ] Berenguer Blasi commented on CASSANDRA-15996: - I have been focusing on this one today and I want to share my findings. Here is the stdout from David's test for the record: {noformat} AssertionError: Log message should be print for CAP and CAP_NOWARN policy assert [] self = @since('2.1') def test_expiration_overflow_policy_cap(self): > self._base_expiration_overflow_policy_test(default_ttl=False, > policy='CAP') ttl_test.py:343: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = , default_ttl = False policy = 'CAP' def _base_expiration_overflow_policy_test(self, default_ttl, policy): """ Checks that expiration date overflow policy is correctly applied @jira_ticket CASSANDRA-14092 """ MAX_TTL = 20 * 365 * 24 * 60 * 60 # 20 years in seconds default_time_to_live = MAX_TTL if default_ttl else None self.prepare(default_time_to_live=default_time_to_live) # Restart node with expiration_date_overflow_policy self.cluster.stop() self.cluster.start(jvm_args=['-Dcassandra.expiration_date_overflow_policy={}'.format(policy)]) self.session1 = self.patient_cql_connection(self.cluster.nodelist()[0]) self.session1.execute("USE ks;") # Try to insert data, should only fail if policy is REJECT query = 'INSERT INTO ttl_table (key, col1) VALUES (%d, %d)' % (1, 1) if not default_time_to_live: query = query + "USING TTL %d" % (MAX_TTL) try: result = self.session1.execute_async(query + ";") result.result() if policy == 'REJECT': self.fail("should throw InvalidRequest") if self.cluster.version() >= '3.0': # client warn only on 3.0+ if policy == 'CAP': logger.debug("Warning is {}", result.warnings[0]) assert 'exceeds maximum supported expiration' in result.warnings[0], 'Warning not found' else: assert not result.warnings, "There should be no warnings" except InvalidRequest as e: if policy != 'REJECT': self.fail("should not throw InvalidRequest") self.cluster.flush() # Data should be present unless policy is reject assert_row_count(self.session1, 'ttl_table', 0 if policy == 'REJECT' else 1) # Check that warning is always logged, unless policy is REJECT if policy != 'REJECT': node1 = self.cluster.nodelist()[0] prefix = 'default ' if default_ttl else '' warning = node1.grep_log("Request on table {}.{} with {}ttl of {} seconds exceeds maximum supported expiration" .format('ks', 'ttl_table', prefix, MAX_TTL)) > assert warning, 'Log message should be print for CAP and CAP_NOWARN > policy' E AssertionError: Log message should be print for CAP and CAP_NOWARN policy E assert [] ttl_test.py:410: AssertionError {noformat} As we can see from the code above we're being called with policy 'CAP'. And following the test code we make it through to line 392 where we [check|https://github.com/apache/cassandra-dtest/blob/master/ttl_test.py#L392] there was a client warning indeed. So the TTL 'business logic' is happening and it's correct. The only bit missing is that being logged which falls on {{NoSpamLogger}}'s shoulders. I can only think of some edge case on {{NoSpamLogger}} missing to log, which would explain why it happens so seldom, why it hasn't been repro'ed so far and why I didn't manage to repro either even on a thinned down machine. > Fix flaky python dtest test_expiration_overflow_policy_capnowarn - > ttl_test.TestTTL > --- > > Key: CASSANDRA-15996 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15996 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: David Capwell >Assignee: Adam Holmberg >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > > https://app.circleci.com/pipelines/github/dcapwell/cassandra/361/workflows/3a42fa45-1f60-4c95-86a4-15a6773e384e/jobs/1860 > {code} > > assert warning, 'Log message should be print for CAP and > > CAP_NOWARN policy' > E AssertionError: Log message should be print for CAP and > CAP_NOWARN policy > E assert [] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To
[jira] [Updated] (CASSANDRA-16207) NPE when calling broadcast address on unintialized node
[ https://issues.apache.org/jira/browse/CASSANDRA-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov updated CASSANDRA-16207: Fix Version/s: 4.0-beta3 3.11.9 3.0.23 Since Version: 3.0.21 Source Control Link: https://github.com/apache/cassandra/commit/6eeca9d6cc482417fd4564302baa349ed76fd7ec Resolution: Fixed Status: Resolved (was: Ready to Commit) Committed to 3.0 with [6eeca9d6cc482417fd4564302baa349ed76fd7ec |https://github.com/apache/cassandra/commit/6eeca9d6cc482417fd4564302baa349ed76fd7ec] and merged to [3.11|https://github.com/apache/cassandra/commit/d3f7bdfe017cd236779cbac0b788ab8a3c619278] and [trunk|https://github.com/apache/cassandra/commit/83033075d334997298dc6937dc64067de76a3077]. > NPE when calling broadcast address on unintialized node > --- > > Key: CASSANDRA-16207 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16207 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Fix For: 3.0.23, 3.11.9, 4.0-beta3 > > > When trying to run upgrades, sometimes we’re calling broadcasts addrerss on > an uninitialised new node: > {code} > java.lang.IllegalStateException: Can't use shut down instances, delegate is > null > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163) > at > org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53) > > at > org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278) > > at > org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) > ~[dtest-3.0.19.jar:?] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213) > > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182) > > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93) > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch cassandra-3.11 updated (45982f5 -> d3f7bdf)
This is an automated email from the ASF dual-hosted git repository. ifesdjeen pushed a change to branch cassandra-3.11 in repository https://gitbox.apache.org/repos/asf/cassandra.git. from 45982f5 Merge branch 'cassandra-3.0' into cassandra-3.11 add 6eeca9d Fix NPE when calling broadcast address on unintialized node add d3f7bdf Merge branch 'cassandra-3.0' into cassandra-3.11 No new revisions were added by this update. Summary of changes: .../cassandra/distributed/upgrade/UpgradeTest.java | 24 ++ 1 file changed, 24 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch trunk updated (d890b7a -> 8303307)
This is an automated email from the ASF dual-hosted git repository. ifesdjeen pushed a change to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git. from d890b7a Merge branch 'cassandra-3.11' into trunk add 6eeca9d Fix NPE when calling broadcast address on unintialized node add d3f7bdf Merge branch 'cassandra-3.0' into cassandra-3.11 add 8303307 Merge branch 'cassandra-3.11' into trunk No new revisions were added by this update. Summary of changes: .../cassandra/distributed/upgrade/UpgradeTest.java | 25 ++ 1 file changed, 25 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch cassandra-3.0 updated (0700dfa -> 6eeca9d)
This is an automated email from the ASF dual-hosted git repository. ifesdjeen pushed a change to branch cassandra-3.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git. from 0700dfa Check SSTables for latest version before dropping compact storage add 6eeca9d Fix NPE when calling broadcast address on unintialized node No new revisions were added by this update. Summary of changes: .../cassandra/distributed/upgrade/UpgradeTest.java | 24 ++ 1 file changed, 24 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16163) Rename master branches to trunk in all repositories
[ https://issues.apache.org/jira/browse/CASSANDRA-16163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213710#comment-17213710 ] Michael Semb Wever commented on CASSANDRA-16163: Instructions 1. In a git clone, create the trunk branch (as a rename of the master) {code} git branch -m master trunk git branch --unset-upstream git push -u origin trunk {code} 2. Open an INFRA ticket, asking for the upstream default branch to change 3. Inform developers of change. > Rename master branches to trunk in all repositories > --- > > Key: CASSANDRA-16163 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16163 > Project: Cassandra > Issue Type: Task > Components: Build >Reporter: Michael Semb Wever >Priority: Normal > > Applies to the following repositories > * cassandra-builds > * cassandra-website > * cassandra-dtest > * cassandra-sidecar > * cassandra-diff > * cassandra-in-jvm-dtest-api > * cassandra-harry > This was discussed in > https://lists.apache.org/thread.html/r54db4cd870d2d665060d5fb50d925843be4b4d54dc64f3d21f04c367%40%3Cdev.cassandra.apache.org%3E > The general preference there was trunk over main, so to match the cassandra > repository. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16168) Rename master branch to trunk in cassandra-diff
[ https://issues.apache.org/jira/browse/CASSANDRA-16168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-16168: --- Change Category: Semantic Complexity: Normal Component/s: Build Status: Open (was: Triage Needed) > Rename master branch to trunk in cassandra-diff > --- > > Key: CASSANDRA-16168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16168 > Project: Cassandra > Issue Type: Sub-task > Components: Build >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra-diff] branch trunk created (now 4c9bc4f)
This is an automated email from the ASF dual-hosted git repository. mck pushed a change to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra-diff.git. at 4c9bc4f Allow optional query retry No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16164) Rename master branch to trunk in cassandra-builds
[ https://issues.apache.org/jira/browse/CASSANDRA-16164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213704#comment-17213704 ] Michael Semb Wever commented on CASSANDRA-16164: Waiting on INFRA-20982 > Rename master branch to trunk in cassandra-builds > - > > Key: CASSANDRA-16164 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16164 > Project: Cassandra > Issue Type: Sub-task > Components: Build >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra-builds] branch trunk created (now 5e17c9b)
This is an automated email from the ASF dual-hosted git repository. mck pushed a change to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra-builds.git. at 5e17c9b Reduce CCM heap settings to match those in circleci, and limit docker containers to 15g memory (and disable swapping) (INFRA-20107) No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16164) Rename master branch to trunk in cassandra-builds
[ https://issues.apache.org/jira/browse/CASSANDRA-16164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-16164: --- Change Category: Semantic Complexity: Normal Component/s: Build Status: Open (was: Triage Needed) > Rename master branch to trunk in cassandra-builds > - > > Key: CASSANDRA-16164 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16164 > Project: Cassandra > Issue Type: Sub-task > Components: Build >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15369) Fake row deletions and range tombstones, causing digest mismatch and sstable growth
[ https://issues.apache.org/jira/browse/CASSANDRA-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213681#comment-17213681 ] Marcus Eriksson commented on CASSANDRA-15369: - looks good in general, two concerns; * performance of {{SinglePartitionReadCommand#reduceFilter}} is much worse now (a silly laptop local benchmark shows queries being 15% slower) - the reason seems to be that we use {{try (UnfilteredRowIterator iterator = result.unfilteredIterator(columnFilter(), filter.getSlices(metadata()), false))}} - I think we can just replace that with {{try (UnfilteredRowIterator iterator = result.unfilteredIterator(columnFilter(), clusterings, false))}}? * {{AbstractBTreePartition#getRow}} - this looks like it is missing the fix from CASSANDRA-15363 - the {{row == null}} case should probably be {code} // this means our partition level deletion superseedes all other deletions and we don't have to keep the row deletions if (activeDeletion == partitionDeletion) return null; // no need to check activeDeletion.isLive here - if anything superseedes the partitionDeletion // it must be non-live return BTreeRow.emptyDeletedRow(clustering, Row.Deletion.regular(activeDeletion)); {code} > Fake row deletions and range tombstones, causing digest mismatch and sstable > growth > --- > > Key: CASSANDRA-15369 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15369 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Local/Memtable, Local/SSTable >Reporter: Benedict Elliott Smith >Assignee: Zhao Yang >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0-beta > > > As assessed in CASSANDRA-15363, we generate fake row deletions and fake > tombstone markers under various circumstances: > * If we perform a clustering key query (or select a compact column): > * Serving from a {{Memtable}}, we will generate fake row deletions > * Serving from an sstable, we will generate fake row tombstone markers > * If we perform a slice query, we will generate only fake row tombstone > markers for any range tombstone that begins or ends outside of the limit of > the requested slice > * If we perform a multi-slice or IN query, this will occur for each > slice/clustering > Unfortunately, these different behaviours can lead to very different data > stored in sstables until a full repair is run. When we read-repair, we only > send these fake deletions or range tombstones. A fake row deletion, > clustering RT and slice RT, each produces a different digest. So for each > single point lookup we can produce a digest mismatch twice, and until a full > repair is run we can encounter an unlimited number of digest mismatches > across different overlapping queries. > Relatedly, this seems a more problematic variant of our atomicity failures > caused by our monotonic reads, since RTs can have an atomic effect across (up > to) the entire partition, whereas the propagation may happen on an > arbitrarily small portion. If the RT exists on only one node, this could > plausibly lead to fairly problematic scenario if that node fails before the > range can be repaired. > At the very least, this behaviour can lead to an almost unlimited amount of > extraneous data being stored until the range is repaired and compaction > happens to overwrite the sub-range RTs and row deletions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16201) Reduce amount of allocations during batch statement execution
[ https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-16201: --- Since Version: 3.0 alpha 1 > Reduce amount of allocations during batch statement execution > - > > Key: CASSANDRA-16201 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16201 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Thomas Steinmaurer >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Attachments: screenshot-1.png, screenshot-2.png > > > In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, > we see 4.0b2 going OOM from time to time. According to a heap dump, we have > multiple NTR threads in a 3-digit MB range. > This is likely related to object array pre-allocations at the size of > {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always > only 1 {{BTreeRow}} in the {{BTree}}. > !screenshot-1.png|width=100%! > So it seems we have many, many 20K elemnts pre-allocated object arrays > resulting in a shallow heap of 80K each, although there is only one element > in the array. > This sort of pre-allocation is causing a lot of memory pressure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16201) Reduce amount of allocations during batch statement execution
[ https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-16201: --- Fix Version/s: 3.0.x > Reduce amount of allocations during batch statement execution > - > > Key: CASSANDRA-16201 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16201 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Thomas Steinmaurer >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Attachments: screenshot-1.png, screenshot-2.png > > > In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, > we see 4.0b2 going OOM from time to time. According to a heap dump, we have > multiple NTR threads in a 3-digit MB range. > This is likely related to object array pre-allocations at the size of > {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always > only 1 {{BTreeRow}} in the {{BTree}}. > !screenshot-1.png|width=100%! > So it seems we have many, many 20K elemnts pre-allocated object arrays > resulting in a shallow heap of 80K each, although there is only one element > in the array. > This sort of pre-allocation is causing a lot of memory pressure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18
[ https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-15430: --- Fix Version/s: (was: 4.0.x) (was: 3.11.x) Since Version: 3.0 alpha 1 > Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations > compared to 2.1.18 > > > Key: CASSANDRA-15430 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15430 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Thomas Steinmaurer >Priority: Normal > Fix For: 3.0.x > > Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, > jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, > jfr_jmc_3-0_obj_obj_alloc.png, jfr_jmc_3-11.png, jfr_jmc_3-11_obj.png, > jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, mutation_stage.png, > screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png > > > In a 6 node loadtest cluster, we have been running with 2.1.18 a certain > production-like workload constantly and sufficiently. After upgrading one > node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of > regression described below), 3.0.18 is showing increased CPU usage, increase > GC, high mutation stage pending tasks, dropped mutation messages ... > Some spec. All 6 nodes equally sized: > * Bare metal, 32 physical cores, 512G RAM > * Xmx31G, G1, max pause millis = 2000ms > * cassandra.yaml basically unchanged, thus same settings in regard to number > of threads, compaction throttling etc. > Following dashboard shows highlighted areas (CPU, suspension) with metrics > for all 6 nodes and the one outlier being the node upgraded to Cassandra > 3.0.18. > !dashboard.png|width=1280! > Additionally we see a large increase on pending tasks in the mutation stage > after the upgrade: > !mutation_stage.png! > And dropped mutation messages, also confirmed in the Cassandra log: > {noformat} > INFO [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - > MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout > and 0 for cross node timeout > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool > NameActive Pending Completed Blocked All Time > Blocked > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > MutationStage 256 81824 3360532756 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > ViewMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > ReadStage 0 0 62862266 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > RequestResponseStage 0 0 2176659856 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > ReadRepairStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > CounterMutationStage 0 0 0 0 > 0 > ... > {noformat} > Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different > node, high-level, it looks like the code path underneath > {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in > 3.0.18 compared to 2.1.18. > !jfr_allocations.png! > Left => 3.0.18 > Right => 2.1.18 > JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I > can upload them, if there is another destination available. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18
[ https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-15430: --- Resolution: Duplicate (was: Fixed) Status: Resolved (was: Open) > Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations > compared to 2.1.18 > > > Key: CASSANDRA-15430 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15430 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Thomas Steinmaurer >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, > jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, > jfr_jmc_3-0_obj_obj_alloc.png, jfr_jmc_3-11.png, jfr_jmc_3-11_obj.png, > jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, mutation_stage.png, > screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png > > > In a 6 node loadtest cluster, we have been running with 2.1.18 a certain > production-like workload constantly and sufficiently. After upgrading one > node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of > regression described below), 3.0.18 is showing increased CPU usage, increase > GC, high mutation stage pending tasks, dropped mutation messages ... > Some spec. All 6 nodes equally sized: > * Bare metal, 32 physical cores, 512G RAM > * Xmx31G, G1, max pause millis = 2000ms > * cassandra.yaml basically unchanged, thus same settings in regard to number > of threads, compaction throttling etc. > Following dashboard shows highlighted areas (CPU, suspension) with metrics > for all 6 nodes and the one outlier being the node upgraded to Cassandra > 3.0.18. > !dashboard.png|width=1280! > Additionally we see a large increase on pending tasks in the mutation stage > after the upgrade: > !mutation_stage.png! > And dropped mutation messages, also confirmed in the Cassandra log: > {noformat} > INFO [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - > MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout > and 0 for cross node timeout > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool > NameActive Pending Completed Blocked All Time > Blocked > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > MutationStage 256 81824 3360532756 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > ViewMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > ReadStage 0 0 62862266 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > RequestResponseStage 0 0 2176659856 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > ReadRepairStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > CounterMutationStage 0 0 0 0 > 0 > ... > {noformat} > Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different > node, high-level, it looks like the code path underneath > {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in > 3.0.18 compared to 2.1.18. > !jfr_allocations.png! > Left => 3.0.18 > Right => 2.1.18 > JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I > can upload them, if there is another destination available. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18
[ https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-15430: --- Status: Open (was: Resolved) > Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations > compared to 2.1.18 > > > Key: CASSANDRA-15430 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15430 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Thomas Steinmaurer >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, > jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, > jfr_jmc_3-0_obj_obj_alloc.png, jfr_jmc_3-11.png, jfr_jmc_3-11_obj.png, > jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, mutation_stage.png, > screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png > > > In a 6 node loadtest cluster, we have been running with 2.1.18 a certain > production-like workload constantly and sufficiently. After upgrading one > node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of > regression described below), 3.0.18 is showing increased CPU usage, increase > GC, high mutation stage pending tasks, dropped mutation messages ... > Some spec. All 6 nodes equally sized: > * Bare metal, 32 physical cores, 512G RAM > * Xmx31G, G1, max pause millis = 2000ms > * cassandra.yaml basically unchanged, thus same settings in regard to number > of threads, compaction throttling etc. > Following dashboard shows highlighted areas (CPU, suspension) with metrics > for all 6 nodes and the one outlier being the node upgraded to Cassandra > 3.0.18. > !dashboard.png|width=1280! > Additionally we see a large increase on pending tasks in the mutation stage > after the upgrade: > !mutation_stage.png! > And dropped mutation messages, also confirmed in the Cassandra log: > {noformat} > INFO [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - > MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout > and 0 for cross node timeout > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool > NameActive Pending Completed Blocked All Time > Blocked > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > MutationStage 256 81824 3360532756 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > ViewMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > ReadStage 0 0 62862266 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > RequestResponseStage 0 0 2176659856 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > ReadRepairStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > CounterMutationStage 0 0 0 0 > 0 > ... > {noformat} > Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different > node, high-level, it looks like the code path underneath > {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in > 3.0.18 compared to 2.1.18. > !jfr_allocations.png! > Left => 3.0.18 > Right => 2.1.18 > JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I > can upload them, if there is another destination available. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18
[ https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213675#comment-17213675 ] Michael Semb Wever commented on CASSANDRA-15430: [~tsteinmaurer], under CASSANDRA-16201 [~marcuse] and I plan to address the issues also here. To have it stated, 16201 also needs to include for 3.0, 1. add initialCapacity to BTree$Builder, 2. make sure initialCapacity is sane, 3. add an initialCapacity to MultiCBuilder > Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations > compared to 2.1.18 > > > Key: CASSANDRA-15430 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15430 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Thomas Steinmaurer >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, > jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, > jfr_jmc_3-0_obj_obj_alloc.png, jfr_jmc_3-11.png, jfr_jmc_3-11_obj.png, > jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, mutation_stage.png, > screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png > > > In a 6 node loadtest cluster, we have been running with 2.1.18 a certain > production-like workload constantly and sufficiently. After upgrading one > node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of > regression described below), 3.0.18 is showing increased CPU usage, increase > GC, high mutation stage pending tasks, dropped mutation messages ... > Some spec. All 6 nodes equally sized: > * Bare metal, 32 physical cores, 512G RAM > * Xmx31G, G1, max pause millis = 2000ms > * cassandra.yaml basically unchanged, thus same settings in regard to number > of threads, compaction throttling etc. > Following dashboard shows highlighted areas (CPU, suspension) with metrics > for all 6 nodes and the one outlier being the node upgraded to Cassandra > 3.0.18. > !dashboard.png|width=1280! > Additionally we see a large increase on pending tasks in the mutation stage > after the upgrade: > !mutation_stage.png! > And dropped mutation messages, also confirmed in the Cassandra log: > {noformat} > INFO [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - > MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout > and 0 for cross node timeout > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool > NameActive Pending Completed Blocked All Time > Blocked > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > MutationStage 256 81824 3360532756 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > ViewMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > ReadStage 0 0 62862266 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > RequestResponseStage 0 0 2176659856 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > ReadRepairStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > CounterMutationStage 0 0 0 0 > 0 > ... > {noformat} > Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different > node, high-level, it looks like the code path underneath > {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in > 3.0.18 compared to 2.1.18. > !jfr_allocations.png! > Left => 3.0.18 > Right => 2.1.18 > JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I > can upload them, if there is another destination available. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org