[jira] [Commented] (CASSANDRA-15996) Fix flaky python dtest test_expiration_overflow_policy_capnowarn - ttl_test.TestTTL
[ https://issues.apache.org/jira/browse/CASSANDRA-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215186#comment-17215186 ] Berenguer Blasi commented on CASSANDRA-15996: - That could be it indeed imo. Given NoSpamLogger is to be used in hot paths and 'currentTimeMillis' resolution issues I'd got for the 'Long.MIN_VALUE' route. Also that keeps things within the 'nanotime()' world, sort to speak, so we don't inadvertently introduce some perf profile change . Also [https://stackoverflow.com/a/54566928/3432945|http://example.com] read was interesting. I have +1'ed the 'Long.MIN_VALUE' PR pending sbdy that knows about the upgrade test failures confirming they are indeed unrelated. Nice catch either if it turn out to be it or not! :-) > Fix flaky python dtest test_expiration_overflow_policy_capnowarn - > ttl_test.TestTTL > --- > > Key: CASSANDRA-15996 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15996 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: David Capwell >Assignee: Adam Holmberg >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > > https://app.circleci.com/pipelines/github/dcapwell/cassandra/361/workflows/3a42fa45-1f60-4c95-86a4-15a6773e384e/jobs/1860 > {code} > > assert warning, 'Log message should be print for CAP and > > CAP_NOWARN policy' > E AssertionError: Log message should be print for CAP and > CAP_NOWARN policy > E assert [] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16210) Synchronize Keyspace instance store/clear
[ https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214734#comment-17214734 ] Ekaterina Dimitrova edited comment on CASSANDRA-16210 at 10/16/20, 3:57 AM: The issue is that it was possible to open a new Keyspace instance in the middle of Schema.dropKeyspace(). To see the problem the drop has to progress to the following [state|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/config/Schema.java#L657]: 1) Keyspace instance doesn't exist - it has been already removed. 2) KeyspaceMetadata still exists Keyspace.open in this state creates a new Keyspace instance (with ColumnFamilyStore instances) and stores it in Schema.keyspaceInstances. This is an object leak. [3.11 |https://github.com/ekaterinadimitrova2/cassandra/pull/59] CI run: [Java8 | https://jenkins-cm4.apache.org/job/Cassandra-devbranch/104/#showFailuresLink] [4.0 | https://github.com/ekaterinadimitrova2/cassandra/pull/62] CI run: [Java 8 | https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9d0905bd-6ca6-480a-862b-35d5842ed5ef] | [Java 11 | https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9476a603-a494-4da3-bf69-9498d40ae29a] was (Author: e.dimitrova): The issue is that it was possible to open a new Keyspace instance in the middle of Schema.dropKeyspace(). To see the problem the drop has to progress to the following [state|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/config/Schema.java#L657]: 1) Keyspace instance doesn't exist - it has been already removed. 2) KeyspaceMetadata still exists Keyspace.open in this state creates a new Keyspace instance (with ColumnFamilyStore instances) and stores it in Schema.keyspaceInstances. This is an object leak. [3.11 |https://github.com/ekaterinadimitrova2/cassandra/pull/59] CI run: [Java8 | https://jenkins-cm4.apache.org/job/Cassandra-devbranch/104/#showFailuresLink] [4.0 | https://github.com/ekaterinadimitrova2/cassandra/pull/62] CI run: [Java 8 | https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9d0905bd-6ca6-480a-862b-35d5842ed5ef] | [Java 11 | https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9476a603-a494-4da3-bf69-9498d40ae29a] > Synchronize Keyspace instance store/clear > - > > Key: CASSANDRA-16210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16210 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.x, 4.0-beta3 > > > DTest failure: > dtest-large.repair_tests.repair_test.TestRepairDataSystemTable.test_repair_table > (vnodes) - one random failure was reported which pointed to a race condition > to be spotted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16210) Synchronize Keyspace instance store/clear
[ https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214734#comment-17214734 ] Ekaterina Dimitrova edited comment on CASSANDRA-16210 at 10/16/20, 3:56 AM: The issue is that it was possible to open a new Keyspace instance in the middle of Schema.dropKeyspace(). To see the problem the drop has to progress to the following [state|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/config/Schema.java#L657]: 1) Keyspace instance doesn't exist - it has been already removed. 2) KeyspaceMetadata still exists Keyspace.open in this state creates a new Keyspace instance (with ColumnFamilyStore instances) and stores it in Schema.keyspaceInstances. This is an object leak. [3.11 |https://github.com/ekaterinadimitrova2/cassandra/pull/59] CI run: [Java8 | https://jenkins-cm4.apache.org/job/Cassandra-devbranch/104/#showFailuresLink] [4.0 | https://github.com/ekaterinadimitrova2/cassandra/pull/62] CI run: [Java 8 | https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9d0905bd-6ca6-480a-862b-35d5842ed5ef] | [Java 11 | https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9476a603-a494-4da3-bf69-9498d40ae29a] was (Author: e.dimitrova): The issue is, it was possible to open a new Keyspace instance in the middle of Schema.dropKeyspace(). To see the problem the drop has to progress to the following [state|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/config/Schema.java#L657]: 1) Keyspace instance doesn't exist - it has been already removed. 2) KeyspaceMetadata still exists Keyspace.open in this state creates a new Keyspace instance (with ColumnFamilyStore instances) and stores it in Schema.keyspaceInstances. This is an object leak. [3.11 |https://github.com/ekaterinadimitrova2/cassandra/pull/59] CI run: [Java8 | https://jenkins-cm4.apache.org/job/Cassandra-devbranch/104/#showFailuresLink] [4.0 | https://github.com/ekaterinadimitrova2/cassandra/pull/62] CI run: [Java 8 | https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9d0905bd-6ca6-480a-862b-35d5842ed5ef] | [Java 11 | https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9476a603-a494-4da3-bf69-9498d40ae29a] > Synchronize Keyspace instance store/clear > - > > Key: CASSANDRA-16210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16210 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.x, 4.0-beta3 > > > DTest failure: > dtest-large.repair_tests.repair_test.TestRepairDataSystemTable.test_repair_table > (vnodes) - one random failure was reported which pointed to a race condition > to be spotted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear
[ https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-16210: Test and Documentation Plan: (was: It turned out the issue is already solved for 4.0 with CASSANDRA-9425 Posting patch for [3.11 |https://github.com/ekaterinadimitrova2/cassandra/pull/59] CI run: [Java8 |https://jenkins-cm4.apache.org/job/Cassandra-devbranch/104/#showFailuresLink] ) > Synchronize Keyspace instance store/clear > - > > Key: CASSANDRA-16210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16210 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.x, 4.0-beta3 > > > DTest failure: > dtest-large.repair_tests.repair_test.TestRepairDataSystemTable.test_repair_table > (vnodes) - one random failure was reported which pointed to a race condition > to be spotted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16210) Synchronize Keyspace instance store/clear
[ https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214734#comment-17214734 ] Ekaterina Dimitrova edited comment on CASSANDRA-16210 at 10/16/20, 3:55 AM: The issue is, it was possible to open a new Keyspace instance in the middle of Schema.dropKeyspace(). To see the problem the drop has to progress to the following [state|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/config/Schema.java#L657]: 1) Keyspace instance doesn't exist - it has been already removed. 2) KeyspaceMetadata still exists Keyspace.open in this state creates a new Keyspace instance (with ColumnFamilyStore instances) and stores it in Schema.keyspaceInstances. This is an object leak. [3.11 |https://github.com/ekaterinadimitrova2/cassandra/pull/59] CI run: [Java8 | https://jenkins-cm4.apache.org/job/Cassandra-devbranch/104/#showFailuresLink] [4.0 | https://github.com/ekaterinadimitrova2/cassandra/pull/62] CI run: [Java 8 | https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9d0905bd-6ca6-480a-862b-35d5842ed5ef] | [Java 11 | https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/414/workflows/9476a603-a494-4da3-bf69-9498d40ae29a] was (Author: e.dimitrova): It turned out the issue is already solved for 4.0 with CASSANDRA-9425 Posting patch for [3.11 | https://github.com/ekaterinadimitrova2/cassandra/pull/59] CI run: [Java8 | https://jenkins-cm4.apache.org/job/Cassandra-devbranch/104/#showFailuresLink] > Synchronize Keyspace instance store/clear > - > > Key: CASSANDRA-16210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16210 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.x > > > DTest failure: > dtest-large.repair_tests.repair_test.TestRepairDataSystemTable.test_repair_table > (vnodes) - one random failure was reported which pointed to a race condition > to be spotted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear
[ https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-16210: Fix Version/s: 4.0-beta3 > Synchronize Keyspace instance store/clear > - > > Key: CASSANDRA-16210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16210 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.x, 4.0-beta3 > > > DTest failure: > dtest-large.repair_tests.repair_test.TestRepairDataSystemTable.test_repair_table > (vnodes) - one random failure was reported which pointed to a race condition > to be spotted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16048) Safely Ignore Compact Storage Tables Where Users Have Defined Clustering and Value Columns
[ https://issues.apache.org/jira/browse/CASSANDRA-16048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215155#comment-17215155 ] Jordan West edited comment on CASSANDRA-16048 at 10/16/20, 3:55 AM: Updated the branch to address [~marcuse]'s comment re: updating the flags in {{system_schema.tables}}. Had to move things around to account for the changes in CASSANDRA-16063. Updated the test as well. I skipped adding a flag since we can't detect and "undo" updating the tables that were updated. If folks feel strongly about the flag I can add it. Branch: [https://github.com/apache/cassandra/compare/trunk...jrwest:jwest/16048] Tests: [https://app.circleci.com/pipelines/github/jrwest/cassandra?branch=jwest%2F16048] was (Author: jrwest): Updated the branch to address [~marcuse]'s comment re: updated the flags in {{system_schema.tables}}. Had to move things around to account for the changes in CASSANDRA-16063. Updated the test as well. I skipped adding a flag since we can't detect and "undo" updating the tables that were updated. If folks feel strongly about the flag I can add it. Branch: https://github.com/apache/cassandra/compare/trunk...jrwest:jwest/16048 Tests: https://app.circleci.com/pipelines/github/jrwest/cassandra?branch=jwest%2F16048 > Safely Ignore Compact Storage Tables Where Users Have Defined Clustering and > Value Columns > -- > > Key: CASSANDRA-16048 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16048 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/CQL >Reporter: Jordan West >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > Some compact storage tables, specifically those where the user has defined > both at least one clustering and the value column, can be safely handled in > 4.0 because besides the DENSE flag they are not materially different post 3.0 > and there is no visible change to the user facing schema after dropping > compact storage. We can detect this case and allow these tables to silently > drop the DENSE flag while still throwing a start-up error for COMPACT STORAGE > tables that don’t meet the criteria. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16048) Safely Ignore Compact Storage Tables Where Users Have Defined Clustering and Value Columns
[ https://issues.apache.org/jira/browse/CASSANDRA-16048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215155#comment-17215155 ] Jordan West commented on CASSANDRA-16048: - Updated the branch to address [~marcuse]'s comment re: updated the flags in {{system_schema.tables}}. Had to move things around to account for the changes in CASSANDRA-16063. Updated the test as well. I skipped adding a flag since we can't detect and "undo" updating the tables that were updated. If folks feel strongly about the flag I can add it. Branch: https://github.com/apache/cassandra/compare/trunk...jrwest:jwest/16048 Tests: https://app.circleci.com/pipelines/github/jrwest/cassandra?branch=jwest%2F16048 > Safely Ignore Compact Storage Tables Where Users Have Defined Clustering and > Value Columns > -- > > Key: CASSANDRA-16048 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16048 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/CQL >Reporter: Jordan West >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > Some compact storage tables, specifically those where the user has defined > both at least one clustering and the value column, can be safely handled in > 4.0 because besides the DENSE flag they are not materially different post 3.0 > and there is no visible change to the user facing schema after dropping > compact storage. We can detect this case and allow these tables to silently > drop the DENSE flag while still throwing a start-up error for COMPACT STORAGE > tables that don’t meet the criteria. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRASC-27) CDC reader in Apache Cassandra Sidecar
[ https://issues.apache.org/jira/browse/CASSANDRASC-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215124#comment-17215124 ] maxwellguo commented on CASSANDRASC-27: --- Thank you [~tharanga] . > CDC reader in Apache Cassandra Sidecar > -- > > Key: CASSANDRASC-27 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-27 > Project: Sidecar for Apache Cassandra > Issue Type: New Feature >Reporter: Vinay Chella >Assignee: Tharanga Sampath Gamaethige >Priority: Normal > > Apache Cassandra has the CDC (Change Data Capture) features since 3.8. This > is further enhanced with (CASS-12148) in Cassandra 4.0. > However, there’s no generally available mechanism to stream changes out of a > Cassandra database; hence the utility of this feature is limited if not > absent. > Many applications use Cassandra as their primary data store. For various > reasons(Caching, analyzing, indexing, etc), this data needs to be > synchronized with derived/secondary data stores. We would like to emit > change streams in real-time to consumers so that changes to Cassandra can be > used for various purposes. > *Goals* > * Enhance Apache Cassandra sidecar with a CDC reader that can read and emit > changes in real-time. Priority for the initial implementation is safety and > correctness, performance enhancements will follow in subsequent iterations > *Nongoals* > * Modify Cassandra storage engine to emit changes > > *Proposal* > [https://docs.google.com/document/d/11YywfJTm29szZOVOSRbtfvClbmMQtJ8WyCB7_CUgo-U/edit?usp=sharing] > > *PR* > https://github.com/apache/cassandra-sidecar/pull/16 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRASC-27) CDC reader in Apache Cassandra Sidecar
[ https://issues.apache.org/jira/browse/CASSANDRASC-27?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maxwellguo reassigned CASSANDRASC-27: - Assignee: Tharanga Sampath Gamaethige (was: maxwellguo) > CDC reader in Apache Cassandra Sidecar > -- > > Key: CASSANDRASC-27 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-27 > Project: Sidecar for Apache Cassandra > Issue Type: New Feature >Reporter: Vinay Chella >Assignee: Tharanga Sampath Gamaethige >Priority: Normal > > Apache Cassandra has the CDC (Change Data Capture) features since 3.8. This > is further enhanced with (CASS-12148) in Cassandra 4.0. > However, there’s no generally available mechanism to stream changes out of a > Cassandra database; hence the utility of this feature is limited if not > absent. > Many applications use Cassandra as their primary data store. For various > reasons(Caching, analyzing, indexing, etc), this data needs to be > synchronized with derived/secondary data stores. We would like to emit > change streams in real-time to consumers so that changes to Cassandra can be > used for various purposes. > *Goals* > * Enhance Apache Cassandra sidecar with a CDC reader that can read and emit > changes in real-time. Priority for the initial implementation is safety and > correctness, performance enhancements will follow in subsequent iterations > *Nongoals* > * Modify Cassandra storage engine to emit changes > > *Proposal* > [https://docs.google.com/document/d/11YywfJTm29szZOVOSRbtfvClbmMQtJ8WyCB7_CUgo-U/edit?usp=sharing] > > *PR* > https://github.com/apache/cassandra-sidecar/pull/16 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15996) Fix flaky python dtest test_expiration_overflow_policy_capnowarn - ttl_test.TestTTL
[ https://issues.apache.org/jira/browse/CASSANDRA-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215014#comment-17215014 ] Adam Holmberg edited comment on CASSANDRA-15996 at 10/15/20, 9:24 PM: -- Created two patches for consideration |[currentTimeMillis|https://github.com/aholmberg/cassandra/pull/13/files#diff-e2c5319b6d6b31133eb6f8daf05716ee2358471ae66ac8dedb1df5fd669e088b]|[ci|https://app.circleci.com/pipelines/github/aholmberg/cassandra?branch=CASSANDRA-15996]| |[Long.MIN_VALUE|https://github.com/aholmberg/cassandra/pull/14]|[ci|https://app.circleci.com/pipelines/github/aholmberg/cassandra?branch=CASSANDRA-15996-alt]| (also fixing what I believe to be incorrect behavior shown in one of the unit tests) was (Author: aholmber): Created two patches for consideration |[currentTimeMillis|https://github.com/aholmberg/cassandra/pull/13/files#diff-e2c5319b6d6b31133eb6f8daf05716ee2358471ae66ac8dedb1df5fd669e088b]|[ci|https://app.circleci.com/pipelines/github/aholmberg/cassandra?branch=CASSANDRA-15996]| |[Long.MIN_VALUE|https://github.com/aholmberg/cassandra/pull/14]|[ci|https://app.circleci.com/pipelines/github/aholmberg/cassandra?branch=CASSANDRA-15996-alt]| > Fix flaky python dtest test_expiration_overflow_policy_capnowarn - > ttl_test.TestTTL > --- > > Key: CASSANDRA-15996 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15996 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: David Capwell >Assignee: Adam Holmberg >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > > https://app.circleci.com/pipelines/github/dcapwell/cassandra/361/workflows/3a42fa45-1f60-4c95-86a4-15a6773e384e/jobs/1860 > {code} > > assert warning, 'Log message should be print for CAP and > > CAP_NOWARN policy' > E AssertionError: Log message should be print for CAP and > CAP_NOWARN policy > E assert [] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15996) Fix flaky python dtest test_expiration_overflow_policy_capnowarn - ttl_test.TestTTL
[ https://issues.apache.org/jira/browse/CASSANDRA-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215014#comment-17215014 ] Adam Holmberg commented on CASSANDRA-15996: --- Created two patches for consideration |[currentTimeMillis|https://github.com/aholmberg/cassandra/pull/13/files#diff-e2c5319b6d6b31133eb6f8daf05716ee2358471ae66ac8dedb1df5fd669e088b]|[ci|https://app.circleci.com/pipelines/github/aholmberg/cassandra?branch=CASSANDRA-15996]| |[Long.MIN_VALUE|https://github.com/aholmberg/cassandra/pull/14]|[ci|https://app.circleci.com/pipelines/github/aholmberg/cassandra?branch=CASSANDRA-15996-alt]| > Fix flaky python dtest test_expiration_overflow_policy_capnowarn - > ttl_test.TestTTL > --- > > Key: CASSANDRA-15996 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15996 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: David Capwell >Assignee: Adam Holmberg >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > > https://app.circleci.com/pipelines/github/dcapwell/cassandra/361/workflows/3a42fa45-1f60-4c95-86a4-15a6773e384e/jobs/1860 > {code} > > assert warning, 'Log message should be print for CAP and > > CAP_NOWARN policy' > E AssertionError: Log message should be print for CAP and > CAP_NOWARN policy > E assert [] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16213) Cannot replace_address /X because it doesn't exist in gossip
[ https://issues.apache.org/jira/browse/CASSANDRA-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-16213: Reviewers: Brandon Williams, Paulo Motta > Cannot replace_address /X because it doesn't exist in gossip > > > Key: CASSANDRA-16213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16213 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Cluster/Membership >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.0-beta > > > We see this exception around nodes crashing and trying to do a host > replacement; this error appears to be correlated around multiple node > failures. > A simplified case to trigger this is the following > *) Have a N node cluster > *) Shutdown all N nodes > *) Bring up N-1 nodes (at least 1 seed, else replace seed) > *) Host replace the N-1th node -> this will fail with the above > The reason this happens is that the N-1th node isn’t gossiping anymore, and > the existing nodes do not have its details in gossip (but have the details in > the peers table), so the host replacement fails as the node isn’t known in > gossip. > This affects all versions (tested 3.0 and trunk, assume 2.2 as well) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16213) Cannot replace_address /X because it doesn't exist in gossip
[ https://issues.apache.org/jira/browse/CASSANDRA-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214972#comment-17214972 ] David Capwell commented on CASSANDRA-16213: --- [~paulo]. Brandon told me in slack you would be a good person to review as well, would you be able to? > Cannot replace_address /X because it doesn't exist in gossip > > > Key: CASSANDRA-16213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16213 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Cluster/Membership >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.0-beta > > > We see this exception around nodes crashing and trying to do a host > replacement; this error appears to be correlated around multiple node > failures. > A simplified case to trigger this is the following > *) Have a N node cluster > *) Shutdown all N nodes > *) Bring up N-1 nodes (at least 1 seed, else replace seed) > *) Host replace the N-1th node -> this will fail with the above > The reason this happens is that the N-1th node isn’t gossiping anymore, and > the existing nodes do not have its details in gossip (but have the details in > the peers table), so the host replacement fails as the node isn’t known in > gossip. > This affects all versions (tested 3.0 and trunk, assume 2.2 as well) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16213) Cannot replace_address /X because it doesn't exist in gossip
[ https://issues.apache.org/jira/browse/CASSANDRA-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214966#comment-17214966 ] David Capwell commented on CASSANDRA-16213: --- Sorry, I misspoke, on startup we do add it back into the ring, see https://github.com/apache/cassandra/blob/699a1f74fcc1da1952da6b2b0309c9e2474c67f4/src/java/org/apache/cassandra/service/StorageService.java#L604-L617. So currently, each node will add it back into the ring, and will add it back into gossip. > Cannot replace_address /X because it doesn't exist in gossip > > > Key: CASSANDRA-16213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16213 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Cluster/Membership >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.0-beta > > > We see this exception around nodes crashing and trying to do a host > replacement; this error appears to be correlated around multiple node > failures. > A simplified case to trigger this is the following > *) Have a N node cluster > *) Shutdown all N nodes > *) Bring up N-1 nodes (at least 1 seed, else replace seed) > *) Host replace the N-1th node -> this will fail with the above > The reason this happens is that the N-1th node isn’t gossiping anymore, and > the existing nodes do not have its details in gossip (but have the details in > the peers table), so the host replacement fails as the node isn’t known in > gossip. > This affects all versions (tested 3.0 and trunk, assume 2.2 as well) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16213) Cannot replace_address /X because it doesn't exist in gossip
[ https://issues.apache.org/jira/browse/CASSANDRA-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214963#comment-17214963 ] David Capwell commented on CASSANDRA-16213: --- Thanks for the replay [~brandon.williams]! bq. If you shutdown the entire ring in non-rolling fashion then it is no surprise We see this in rolling fashion as well, full cluster was easier to reproduce; so the issue isn't isolated to full cluster outage. bq. You can no longer replace as a consequence What is the recommendation in these cases? bq. A node injecting states that don't belong to itself is generally forbidden as it is dangerous In the case I call out we don't add the node to the ring, but we do add it to gossip, see https://github.com/apache/cassandra/blob/699a1f74fcc1da1952da6b2b0309c9e2474c67f4/src/java/org/apache/cassandra/gms/Gossiper.java#L1754-L1780. We will try to evict it from gossip (see https://github.com/apache/cassandra/blob/699a1f74fcc1da1952da6b2b0309c9e2474c67f4/src/java/org/apache/cassandra/gms/Gossiper.java#L960-L969), but we also see in the wild that this eviction doesn't happen and it stays there forever; here is a sample from gossipinfo on a real cluster {code} / generation:0 heartbeat:0 TOKENS: not present {code} > Cannot replace_address /X because it doesn't exist in gossip > > > Key: CASSANDRA-16213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16213 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Cluster/Membership >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.0-beta > > > We see this exception around nodes crashing and trying to do a host > replacement; this error appears to be correlated around multiple node > failures. > A simplified case to trigger this is the following > *) Have a N node cluster > *) Shutdown all N nodes > *) Bring up N-1 nodes (at least 1 seed, else replace seed) > *) Host replace the N-1th node -> this will fail with the above > The reason this happens is that the N-1th node isn’t gossiping anymore, and > the existing nodes do not have its details in gossip (but have the details in > the peers table), so the host replacement fails as the node isn’t known in > gossip. > This affects all versions (tested 3.0 and trunk, assume 2.2 as well) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear
[ https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-16210: Description: DTest failure: dtest-large.repair_tests.repair_test.TestRepairDataSystemTable.test_repair_table (vnodes) - one random failure was reported which pointed to a race condition to be spotted. (was: DTest failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one random failure was reported which pointed to a race condition to be spotted. ) > Synchronize Keyspace instance store/clear > - > > Key: CASSANDRA-16210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16210 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.x > > > DTest failure: > dtest-large.repair_tests.repair_test.TestRepairDataSystemTable.test_repair_table > (vnodes) - one random failure was reported which pointed to a race condition > to be spotted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear
[ https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-16210: Description: DTest failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one random failure was reported which pointed to a race condition to be spotted. (was: Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one random failure was reported which pointed to a race condition to be spotted. ) > Synchronize Keyspace instance store/clear > - > > Key: CASSANDRA-16210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16210 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.x > > > DTest failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one > random failure was reported which pointed to a race condition to be spotted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear
[ https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-16210: Discovered By: User Report (was: Unit Test) > Synchronize Keyspace instance store/clear > - > > Key: CASSANDRA-16210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16210 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.x > > > Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one > random failure was reported which pointed to a race condition to be spotted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRASC-27) CDC reader in Apache Cassandra Sidecar
[ https://issues.apache.org/jira/browse/CASSANDRASC-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214938#comment-17214938 ] Tharanga Sampath Gamaethige commented on CASSANDRASC-27: WIP version of the PR is out : https://github.com/apache/cassandra-sidecar/pull/16 > CDC reader in Apache Cassandra Sidecar > -- > > Key: CASSANDRASC-27 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-27 > Project: Sidecar for Apache Cassandra > Issue Type: New Feature >Reporter: Vinay Chella >Assignee: maxwellguo >Priority: Normal > > Apache Cassandra has the CDC (Change Data Capture) features since 3.8. This > is further enhanced with (CASS-12148) in Cassandra 4.0. > However, there’s no generally available mechanism to stream changes out of a > Cassandra database; hence the utility of this feature is limited if not > absent. > Many applications use Cassandra as their primary data store. For various > reasons(Caching, analyzing, indexing, etc), this data needs to be > synchronized with derived/secondary data stores. We would like to emit > change streams in real-time to consumers so that changes to Cassandra can be > used for various purposes. > *Goals* > * Enhance Apache Cassandra sidecar with a CDC reader that can read and emit > changes in real-time. Priority for the initial implementation is safety and > correctness, performance enhancements will follow in subsequent iterations > *Nongoals* > * Modify Cassandra storage engine to emit changes > > *Proposal* > [https://docs.google.com/document/d/11YywfJTm29szZOVOSRbtfvClbmMQtJ8WyCB7_CUgo-U/edit?usp=sharing] > > *PR* > https://github.com/apache/cassandra-sidecar/pull/16 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-27) CDC reader in Apache Cassandra Sidecar
[ https://issues.apache.org/jira/browse/CASSANDRASC-27?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tharanga Sampath Gamaethige updated CASSANDRASC-27: --- Description: Apache Cassandra has the CDC (Change Data Capture) features since 3.8. This is further enhanced with (CASS-12148) in Cassandra 4.0. However, there’s no generally available mechanism to stream changes out of a Cassandra database; hence the utility of this feature is limited if not absent. Many applications use Cassandra as their primary data store. For various reasons(Caching, analyzing, indexing, etc), this data needs to be synchronized with derived/secondary data stores. We would like to emit change streams in real-time to consumers so that changes to Cassandra can be used for various purposes. *Goals* * Enhance Apache Cassandra sidecar with a CDC reader that can read and emit changes in real-time. Priority for the initial implementation is safety and correctness, performance enhancements will follow in subsequent iterations *Nongoals* * Modify Cassandra storage engine to emit changes *Proposal* [https://docs.google.com/document/d/11YywfJTm29szZOVOSRbtfvClbmMQtJ8WyCB7_CUgo-U/edit?usp=sharing] *PR* https://github.com/apache/cassandra-sidecar/pull/16 was: Apache Cassandra has the CDC (Change Data Capture) features since 3.8. This is further enhanced with (CASS-12148) in Cassandra 4.0. However, there’s no generally available mechanism to stream changes out of a Cassandra database; hence the utility of this feature is limited if not absent. Many applications use Cassandra as their primary data store. For various reasons(Caching, analyzing, indexing, etc), this data needs to be synchronized with derived/secondary data stores. We would like to emit change streams in real-time to consumers so that changes to Cassandra can be used for various purposes. *Goals* * Enhance Apache Cassandra sidecar with a CDC reader that can read and emit changes in real-time. Priority for the initial implementation is safety and correctness, performance enhancements will follow in subsequent iterations *Nongoals* * Modify Cassandra storage engine to emit changes *Proposal* https://docs.google.com/document/d/11YywfJTm29szZOVOSRbtfvClbmMQtJ8WyCB7_CUgo-U/edit?usp=sharing > CDC reader in Apache Cassandra Sidecar > -- > > Key: CASSANDRASC-27 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-27 > Project: Sidecar for Apache Cassandra > Issue Type: New Feature >Reporter: Vinay Chella >Assignee: maxwellguo >Priority: Normal > > Apache Cassandra has the CDC (Change Data Capture) features since 3.8. This > is further enhanced with (CASS-12148) in Cassandra 4.0. > However, there’s no generally available mechanism to stream changes out of a > Cassandra database; hence the utility of this feature is limited if not > absent. > Many applications use Cassandra as their primary data store. For various > reasons(Caching, analyzing, indexing, etc), this data needs to be > synchronized with derived/secondary data stores. We would like to emit > change streams in real-time to consumers so that changes to Cassandra can be > used for various purposes. > *Goals* > * Enhance Apache Cassandra sidecar with a CDC reader that can read and emit > changes in real-time. Priority for the initial implementation is safety and > correctness, performance enhancements will follow in subsequent iterations > *Nongoals* > * Modify Cassandra storage engine to emit changes > > *Proposal* > [https://docs.google.com/document/d/11YywfJTm29szZOVOSRbtfvClbmMQtJ8WyCB7_CUgo-U/edit?usp=sharing] > > *PR* > https://github.com/apache/cassandra-sidecar/pull/16 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16057) Should update in-jvm dtest to expose stdout and stderr for nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-16057: -- Status: Ready to Commit (was: Review In Progress) > Should update in-jvm dtest to expose stdout and stderr for nodetool > --- > > Key: CASSANDRA-16057 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16057 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/java >Reporter: David Capwell >Assignee: Yifan Cai >Priority: Normal > Fix For: NA > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Many nodetool commands output to stdout or stderr so running nodetool using > in-jvm dtest should expose that to tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15996) Fix flaky python dtest test_expiration_overflow_policy_capnowarn - ttl_test.TestTTL
[ https://issues.apache.org/jira/browse/CASSANDRA-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214937#comment-17214937 ] Adam Holmberg commented on CASSANDRA-15996: --- bq. I think we should switch NoSpamLogger to use currentTimeMillis. This, or we could initialize the {{NoSpamLogStatement}} to {{Long.MIN_VALUE}} instead of zero. I have the changes for either. > Fix flaky python dtest test_expiration_overflow_policy_capnowarn - > ttl_test.TestTTL > --- > > Key: CASSANDRA-15996 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15996 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: David Capwell >Assignee: Adam Holmberg >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > > https://app.circleci.com/pipelines/github/dcapwell/cassandra/361/workflows/3a42fa45-1f60-4c95-86a4-15a6773e384e/jobs/1860 > {code} > > assert warning, 'Log message should be print for CAP and > > CAP_NOWARN policy' > E AssertionError: Log message should be print for CAP and > CAP_NOWARN policy > E assert [] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16057) Should update in-jvm dtest to expose stdout and stderr for nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214936#comment-17214936 ] David Capwell commented on CASSANDRA-16057: --- +1 > Should update in-jvm dtest to expose stdout and stderr for nodetool > --- > > Key: CASSANDRA-16057 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16057 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/java >Reporter: David Capwell >Assignee: Yifan Cai >Priority: Normal > Fix For: NA > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Many nodetool commands output to stdout or stderr so running nodetool using > in-jvm dtest should expose that to tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16057) Should update in-jvm dtest to expose stdout and stderr for nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-16057: -- Reviewers: Alex Petrov, David Capwell, David Capwell (was: Alex Petrov, David Capwell) Alex Petrov, David Capwell, David Capwell (was: Alex Petrov, David Capwell) Status: Review In Progress (was: Patch Available) > Should update in-jvm dtest to expose stdout and stderr for nodetool > --- > > Key: CASSANDRA-16057 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16057 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/java >Reporter: David Capwell >Assignee: Yifan Cai >Priority: Normal > Fix For: NA > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Many nodetool commands output to stdout or stderr so running nodetool using > in-jvm dtest should expose that to tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16213) Cannot replace_address /X because it doesn't exist in gossip
[ https://issues.apache.org/jira/browse/CASSANDRA-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214935#comment-17214935 ] Brandon Williams commented on CASSANDRA-16213: -- This affects all versions since the inception of replacement. If you shutdown the entire ring in non-rolling fashion then it is no surprise that any gossip state not persisted (and specific to an existing live node, which will repopulate it) will be lost. You can no longer replace as a consequence. A node injecting states that don't belong to itself is generally forbidden as it is dangerous, with the except that proves the rule be assassinate (which also sleeps to careful.) No node should need to know about any dead states upon a full ring restart, with the exception of replacement. > Cannot replace_address /X because it doesn't exist in gossip > > > Key: CASSANDRA-16213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16213 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Cluster/Membership >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.0-beta > > > We see this exception around nodes crashing and trying to do a host > replacement; this error appears to be correlated around multiple node > failures. > A simplified case to trigger this is the following > *) Have a N node cluster > *) Shutdown all N nodes > *) Bring up N-1 nodes (at least 1 seed, else replace seed) > *) Host replace the N-1th node -> this will fail with the above > The reason this happens is that the N-1th node isn’t gossiping anymore, and > the existing nodes do not have its details in gossip (but have the details in > the peers table), so the host replacement fails as the node isn’t known in > gossip. > This affects all versions (tested 3.0 and trunk, assume 2.2 as well) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear
[ https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-16210: --- Reviewers: Michael Semb Wever, Michael Semb Wever (was: Michael Semb Wever) Michael Semb Wever, Michael Semb Wever Status: Review In Progress (was: Patch Available) > Synchronize Keyspace instance store/clear > - > > Key: CASSANDRA-16210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16210 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.x > > > Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one > random failure was reported which pointed to a race condition to be spotted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16213) Cannot replace_address /X because it doesn't exist in gossip
[ https://issues.apache.org/jira/browse/CASSANDRA-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-16213: -- Test and Documentation Plan: tests added Status: Patch Available (was: Open) > Cannot replace_address /X because it doesn't exist in gossip > > > Key: CASSANDRA-16213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16213 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Cluster/Membership >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.0-beta > > > We see this exception around nodes crashing and trying to do a host > replacement; this error appears to be correlated around multiple node > failures. > A simplified case to trigger this is the following > *) Have a N node cluster > *) Shutdown all N nodes > *) Bring up N-1 nodes (at least 1 seed, else replace seed) > *) Host replace the N-1th node -> this will fail with the above > The reason this happens is that the N-1th node isn’t gossiping anymore, and > the existing nodes do not have its details in gossip (but have the details in > the peers table), so the host replacement fails as the node isn’t known in > gossip. > This affects all versions (tested 3.0 and trunk, assume 2.2 as well) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16213) Cannot replace_address /X because it doesn't exist in gossip
[ https://issues.apache.org/jira/browse/CASSANDRA-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-16213: -- Bug Category: Parent values: Availability(12983)Level 1 values: Unavailable(12994) Complexity: Challenging Discovered By: User Report Fix Version/s: 4.0-beta Severity: Critical Status: Open (was: Triage Needed) > Cannot replace_address /X because it doesn't exist in gossip > > > Key: CASSANDRA-16213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16213 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Cluster/Membership >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.0-beta > > > We see this exception around nodes crashing and trying to do a host > replacement; this error appears to be correlated around multiple node > failures. > A simplified case to trigger this is the following > *) Have a N node cluster > *) Shutdown all N nodes > *) Bring up N-1 nodes (at least 1 seed, else replace seed) > *) Host replace the N-1th node -> this will fail with the above > The reason this happens is that the N-1th node isn’t gossiping anymore, and > the existing nodes do not have its details in gossip (but have the details in > the peers table), so the host replacement fails as the node isn’t known in > gossip. > This affects all versions (tested 3.0 and trunk, assume 2.2 as well) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16213) Cannot replace_address /X because it doesn't exist in gossip
David Capwell created CASSANDRA-16213: - Summary: Cannot replace_address /X because it doesn't exist in gossip Key: CASSANDRA-16213 URL: https://issues.apache.org/jira/browse/CASSANDRA-16213 Project: Cassandra Issue Type: Bug Components: Cluster/Gossip, Cluster/Membership Reporter: David Capwell Assignee: David Capwell We see this exception around nodes crashing and trying to do a host replacement; this error appears to be correlated around multiple node failures. A simplified case to trigger this is the following *) Have a N node cluster *) Shutdown all N nodes *) Bring up N-1 nodes (at least 1 seed, else replace seed) *) Host replace the N-1th node -> this will fail with the above The reason this happens is that the N-1th node isn’t gossiping anymore, and the existing nodes do not have its details in gossip (but have the details in the peers table), so the host replacement fails as the node isn’t known in gossip. This affects all versions (tested 3.0 and trunk, assume 2.2 as well) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15996) Fix flaky python dtest test_expiration_overflow_policy_capnowarn - ttl_test.TestTTL
[ https://issues.apache.org/jira/browse/CASSANDRA-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214926#comment-17214926 ] Adam Holmberg commented on CASSANDRA-15996: --- bq. instead of relying on the patient cql connection, lets add flags to wait for the binary protocol and other startup stuff to complete The test already waits for binary protocol log-wise, and the connection is established after that. I'm not sure what else we would add. bq. NoSpamLogger has some shuffling of instances around that maybe have a concurrency hole, maybe I am just imagining things. I've stared at this quite a bit and I am reasonably confident there is not an issue with those mappings. Reasoning in part is as we have mentioned there is only a single request in-flight. The other is that no matter what kind of race we could come up with, worst case scenario is we create new wrappers -- there are no runtime errors and it's still using the same logger internally (if it was even the same key). Incidentally I have also never seen another {{NoSpamLogger}} message across thousands of runs of this test. With that in mind I stared a bit more at the [other thing|https://github.com/apache/cassandra/blob/699a1f74fcc1da1952da6b2b0309c9e2474c67f4/src/java/org/apache/cassandra/utils/NoSpamLogger.java#L78-L82] that could cause this not to be logged. {{minIntervalNanos}} is coming from a [static field|https://github.com/apache/cassandra/blob/699a1f74fcc1da1952da6b2b0309c9e2474c67f4/src/java/org/apache/cassandra/db/ExpirationDateOverflowHandling.java#L39] and guaranteed to be set to a known value. {{expected}} is the default zero-initialized value of an AtomicInteger. {{nowNanos}}, on the other hand, is coming from [{{System.nanoTime}}|https://github.com/apache/cassandra/blob/699a1f74fcc1da1952da6b2b0309c9e2474c67f4/src/java/org/apache/cassandra/utils/NoSpamLogger.java#L59-L62], which (TIL) can be [negative|https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#nanoTime--]: bq. This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time. The value returned represents nanoseconds since some fixed but arbitrary origin time (perhaps in the future, so values may be negative). I haven't found a way to prove it, but presently this is my only plausible theory. I think we should switch NoSpamLogger to use {{currentTimeMillis}}. We know its non monotonic and may be less precise, but I think it fits the bill for the spirit of this class, where callers are specifying intervals on the order of whole seconds and minutes. Please let me know if anyone has thoughts on that. > Fix flaky python dtest test_expiration_overflow_policy_capnowarn - > ttl_test.TestTTL > --- > > Key: CASSANDRA-15996 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15996 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: David Capwell >Assignee: Adam Holmberg >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > > https://app.circleci.com/pipelines/github/dcapwell/cassandra/361/workflows/3a42fa45-1f60-4c95-86a4-15a6773e384e/jobs/1860 > {code} > > assert warning, 'Log message should be print for CAP and > > CAP_NOWARN policy' > E AssertionError: Log message should be print for CAP and > CAP_NOWARN policy > E assert [] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16057) Should update in-jvm dtest to expose stdout and stderr for nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214884#comment-17214884 ] Yifan Cai commented on CASSANDRA-16057: --- CI result from the latest in each branch. 3.11: [https://app.circleci.com/pipelines/github/yifan-c/cassandra/131/workflows/0fb514dd-3bed-4c07-a87f-981996b6fcfe] (unrelated failures) 3.0: [https://app.circleci.com/pipelines/github/yifan-c/cassandra/132/workflows/83facbf4-3b82-468c-aa7d-78f90b01cc09] (unrelated failures) 2.2: [https://app.circleci.com/pipelines/github/yifan-c/cassandra/121/workflows/d5d71199-342b-45f8-a1d1-3d57af414142] (unrelated failures) Both 3.11 and 3.0 dtest have failed test "test_closing_connections - thrift_hsha_test.TestThriftHSHA". cc: [~dcapwell] > Should update in-jvm dtest to expose stdout and stderr for nodetool > --- > > Key: CASSANDRA-16057 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16057 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/java >Reporter: David Capwell >Assignee: Yifan Cai >Priority: Normal > Fix For: NA > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Many nodetool commands output to stdout or stderr so running nodetool using > in-jvm dtest should expose that to tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16157) RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-16157: -- Reviewers: David Capwell, Yifan Cai (was: David Capwell) > RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade > --- > > Key: CASSANDRA-16157 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16157 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Fix For: 4.0-beta3 > > > When trying to upgrade 3.0 to 4.0, we’re often running into a problem, if > older node serves as a coordinator: > {code} > 15294 java.lang.RuntimeException: Can not deserialize message > org.apache.cassandra.distributed.impl.MessageImpl@4c46aead > 15295 at > org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299) > ~[dtest-4.0-beta3.jar:?] > 15296 at > org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315) > ~[dtest-4.0-beta3.jar:?] > 15297 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_232] > 15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_232] > 15299 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[?:1.8.0_232] > 15300 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ~[?:1.8.0_232] > 15301 at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > [dtest-4.0-beta3.jar:?] > 15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232] > 15303 Caused by: java.io.EOFException > 15304 at > org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180) > ~[dtest-4.0-beta3.jar:?] > 15305 at > org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68) > ~[dtest-4.0-beta3.jar:?] > 15306 at > org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243) > ~[dtest-4.0-beta3.jar:?] > 15307 at > org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694) > ~[dtest-4.0-beta3.jar:?] > 15308 at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765) > ~[dtest-4.0-beta3.jar:?] > 15309 at > org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) > ~[dtest-4.0-beta3.jar:?] > 15310 at > org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295) > ~[dtest-4.0-beta3.jar:?] > 15311 ... 7 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16211) Improve job metadata queries exception handling in cassandra-diff
[ https://issues.apache.org/jira/browse/CASSANDRA-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214877#comment-17214877 ] Yifan Cai commented on CASSANDRA-16211: --- cc: [~marcuse] > Improve job metadata queries exception handling in cassandra-diff > - > > Key: CASSANDRA-16211 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16211 > Project: Cassandra > Issue Type: Improvement > Components: Tool/diff >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > The job metadata tracks the progress of the diff job. Sometimes, a job can > fail due to the progress update query failures. > The progress update queries can be categorized into 2 groups, critical and > trivial one. > When a query failed to update a trivial status (e.g. ProgressTracker), we > would mostly hope to continue the job and just log the failure. > When a query failed to update a critical status (e.g. JobLifeCycle), we can > apply the client-side retry strategy (e.g. exponential backoff) in addition > to the retry policy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16211) Improve job metadata queries exception handling in cassandra-diff
[ https://issues.apache.org/jira/browse/CASSANDRA-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-16211: -- Test and Documentation Plan: unit test Status: Patch Available (was: Open) PR: [https://github.com/apache/cassandra-diff/pull/13] The patch does what mentioned in the description. * Ignore query exceptions from queries in ProgressTracker * Retry (when a retry strategy is specified) queries in JobLifeCycle > Improve job metadata queries exception handling in cassandra-diff > - > > Key: CASSANDRA-16211 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16211 > Project: Cassandra > Issue Type: Improvement > Components: Tool/diff >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > The job metadata tracks the progress of the diff job. Sometimes, a job can > fail due to the progress update query failures. > The progress update queries can be categorized into 2 groups, critical and > trivial one. > When a query failed to update a trivial status (e.g. ProgressTracker), we > would mostly hope to continue the job and just log the failure. > When a query failed to update a critical status (e.g. JobLifeCycle), we can > apply the client-side retry strategy (e.g. exponential backoff) in addition > to the retry policy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16211) Improve job metadata queries exception handling in cassandra-diff
[ https://issues.apache.org/jira/browse/CASSANDRA-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-16211: -- Change Category: Operability Complexity: Low Hanging Fruit Status: Open (was: Triage Needed) > Improve job metadata queries exception handling in cassandra-diff > - > > Key: CASSANDRA-16211 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16211 > Project: Cassandra > Issue Type: Improvement > Components: Tool/diff >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > The job metadata tracks the progress of the diff job. Sometimes, a job can > fail due to the progress update query failures. > The progress update queries can be categorized into 2 groups, critical and > trivial one. > When a query failed to update a trivial status (e.g. ProgressTracker), we > would mostly hope to continue the job and just log the failure. > When a query failed to update a critical status (e.g. JobLifeCycle), we can > apply the client-side retry strategy (e.g. exponential backoff) in addition > to the retry policy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra-dtest] branch master updated: Revert "Add test_truncate_failure"
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/cassandra-dtest.git The following commit(s) were added to refs/heads/master by this push: new 016a0eb Revert "Add test_truncate_failure" 016a0eb is described below commit 016a0eb38db25ab36e1adabbc0bfe9575212b2ec Author: Brandon Williams AuthorDate: Thu Oct 15 11:46:48 2020 -0500 Revert "Add test_truncate_failure" This reverts commit 8cb6bd23e62c4d3b4e208d3909361d6812182bc6. --- byteman/truncate_fail.btm | 8 cql_test.py | 33 - 2 files changed, 41 deletions(-) diff --git a/byteman/truncate_fail.btm b/byteman/truncate_fail.btm deleted file mode 100644 index fa9caba..000 --- a/byteman/truncate_fail.btm +++ /dev/null @@ -1,8 +0,0 @@ -RULE Throw during truncate operation -CLASS org.apache.cassandra.db.ColumnFamilyStore -METHOD truncateBlocking() -AT ENTRY -IF TRUE -DO - throw new RuntimeException("Dummy failure"); -ENDRULE \ No newline at end of file diff --git a/cql_test.py b/cql_test.py index dde7b7d..eced21d 100644 --- a/cql_test.py +++ b/cql_test.py @@ -1,5 +1,4 @@ import itertools -import re import struct import time import pytest @@ -765,38 +764,6 @@ class TestMiscellaneousCQL(CQLTester): [2, None, 2, None], [3, None, 3, None]]) -@since("4.0") -def test_truncate_failure(self): -""" -@jira_ticket CASSANDRA-16208 -Tests that if a TRUNCATE query fails on some replica, the coordinator will immediately return an error to the -client instead of waiting to time out because it couldn't get the necessary number of success acks. -""" -cluster = self.cluster -cluster.populate(3, install_byteman=True).start() -node1, _, node3 = cluster.nodelist() -node3.byteman_submit(['./byteman/truncate_fail.btm']) - -session = self.patient_exclusive_cql_connection(node1) -create_ks(session, 'ks', 3) - -logger.debug("Creating data table") -session.execute("CREATE TABLE data (id int PRIMARY KEY, data text)") -session.execute("UPDATE data SET data = 'Awesome' WHERE id = 1") - -self.fixture_dtest_setup.ignore_log_patterns = ['Dummy failure'] -logger.debug("Truncating data table (error expected)") - -thrown = False -exception = None -try: -session.execute("TRUNCATE data") -except Exception as e: -exception = e -thrown = True - -assert thrown, "No exception has been thrown" -assert re.search("Truncate failed on replica /127.0.0.3", str(exception)) is not None @since('3.2') class AbortedQueryTester(CQLTester): - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15977) 4.0 quality testing: Read Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214813#comment-17214813 ] Andres de la Peña edited comment on CASSANDRA-15977 at 10/15/20, 4:21 PM: -- Here are the CI results: 3.11 [https://app.circleci.com/pipelines/github/adelapena/cassandra/116/workflows/4007a648-0a65-45a9-bcf6-4ef83017fbba] [https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-test/78/] [https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/284/] [https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest-upgrade/76/] [https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/114/] trunk [https://app.circleci.com/pipelines/github/adelapena/cassandra/117/workflows/7efe9b9a-e2c6-40d4-a183-86ddd9e599f3] [https://app.circleci.com/pipelines/github/adelapena/cassandra/117/workflows/18fe4de3-faaf-4bf9-a74c-f4fe04bf844f] [https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-test/79/] [https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/285/] [https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest-upgrade/77/] [https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/115/] I'm seeing some timeout errors in {{ReadRepairQueryTypesTest}} (like [this one|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/285/testReport/junit/org.apache.cassandra.distributed.test/ReadRepairQueryTypesTest/testPointQueryOnWideTable_13__strategy_BLOCKING_coordinator_2_flush_false_paging_true_/] or [this other one|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/282/testReport/junit/org.apache.cassandra.distributed.test/ReadRepairQueryTypesTest/testRangeQueryWithFilterOnSelectedColumnOnSkinnyTable_14__strategy_BLOCKING_coordinator_2_flush_true_paging_false_/]) that I can't reproduce locally. Not sure whether they might be caused by the CI environment or there's a real problem. Increasing the request timeout doesn't help, so we could try to not so aggressively reuse the cluster. Right now 544 tests use the same cluster, working with a cluster per test like most dtests do might improve this, although that would make the test significantly slower. CC [~maedhroz] was (Author: adelapena): Here are the CI results: 3.11 https://app.circleci.com/pipelines/github/adelapena/cassandra/116/workflows/4007a648-0a65-45a9-bcf6-4ef83017fbba https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-test/78/ https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/284/ https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest-upgrade/76/ https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/114/ trunk https://app.circleci.com/pipelines/github/adelapena/cassandra/117/workflows/7efe9b9a-e2c6-40d4-a183-86ddd9e599f3 https://app.circleci.com/pipelines/github/adelapena/cassandra/117/workflows/18fe4de3-faaf-4bf9-a74c-f4fe04bf844f https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-test/79/ https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/285/ https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest-upgrade/77/ [https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/115/] I'm seeing some timeout errors in {{ReadRepairQueryTypesTest}} (like [this one|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/285/testReport/junit/org.apache.cassandra.distributed.test/ReadRepairQueryTypesTest/testPointQueryOnWideTable_13__strategy_BLOCKING_coordinator_2_flush_false_paging_true_/] or [this other one|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/282/testReport/junit/org.apache.cassandra.distributed.test/ReadRepairQueryTypesTest/testRangeQueryWithFilterOnSelectedColumnOnSkinnyTable_14__strategy_BLOCKING_coordinator_2_flush_true_paging_false_/]) that I can't reproduce locally. Not sure whether they might be caused by the CI environment or there's a real problem. Increasing the request timeout doesn't help, so we could try to not so aggressively reuse the cluster. Right now 224 tests use the same cluster, working with a cluster per test like most dtests do might improve this, although that would make the test significantly slower. CC [~maedhroz] > 4.0 quality testing: Read Repair > > > Key: CASSANDRA-15977 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15977 > Project: Cassandra > Issue Type: Task > Components: Test/dtest/java, Test/unit >Reporter: Andres de la Peña >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-beta > > Time Spent: 13h 20m > Remaining
[jira] [Updated] (CASSANDRA-16177) jvm_upgrade_dtests job issue in CircleCI MIDRES
[ https://issues.apache.org/jira/browse/CASSANDRA-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-16177: Test and Documentation Plan: [CI|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/411/workflows/ba075847-0c8b-4838-ad47-0e0c4324dc0a/jobs/2377] [Patch|https://github.com/ekaterinadimitrova2/cassandra/pull/61] Status: Patch Available (was: In Progress) > jvm_upgrade_dtests job issue in CircleCI MIDRES > --- > > Key: CASSANDRA-16177 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16177 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0-beta > > > jvm_upgrade_dtests work well in HIGHRES, but we see the following issue with > MIDRES: > https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/349/workflows/04bccc52-4e3e-41e2-9c04-93501ea4ce77/jobs/2167/steps -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16177) jvm_upgrade_dtests job issue in CircleCI MIDRES
[ https://issues.apache.org/jira/browse/CASSANDRA-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214817#comment-17214817 ] Ekaterina Dimitrova commented on CASSANDRA-16177: - [~dcapwell] can you review and commit this patch, please? I believe it is the solution we discussed, CI run also proves it. Thanks! [CI|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/411/workflows/ba075847-0c8b-4838-ad47-0e0c4324dc0a/jobs/2377] [Patch|https://github.com/ekaterinadimitrova2/cassandra/pull/61] > jvm_upgrade_dtests job issue in CircleCI MIDRES > --- > > Key: CASSANDRA-16177 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16177 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0-beta > > > jvm_upgrade_dtests work well in HIGHRES, but we see the following issue with > MIDRES: > https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/349/workflows/04bccc52-4e3e-41e2-9c04-93501ea4ce77/jobs/2167/steps -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15977) 4.0 quality testing: Read Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214813#comment-17214813 ] Andres de la Peña commented on CASSANDRA-15977: --- Here are the CI results: 3.11 https://app.circleci.com/pipelines/github/adelapena/cassandra/116/workflows/4007a648-0a65-45a9-bcf6-4ef83017fbba https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-test/78/ https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/284/ https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest-upgrade/76/ https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/114/ trunk https://app.circleci.com/pipelines/github/adelapena/cassandra/117/workflows/7efe9b9a-e2c6-40d4-a183-86ddd9e599f3 https://app.circleci.com/pipelines/github/adelapena/cassandra/117/workflows/18fe4de3-faaf-4bf9-a74c-f4fe04bf844f https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-test/79/ https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/285/ https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest-upgrade/77/ [https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-dtest/115/] I'm seeing some timeout errors in {{ReadRepairQueryTypesTest}} (like [this one|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/285/testReport/junit/org.apache.cassandra.distributed.test/ReadRepairQueryTypesTest/testPointQueryOnWideTable_13__strategy_BLOCKING_coordinator_2_flush_false_paging_true_/] or [this other one|https://ci-cassandra.apache.org/view/all/job/Cassandra-devbranch-jvm-dtest/282/testReport/junit/org.apache.cassandra.distributed.test/ReadRepairQueryTypesTest/testRangeQueryWithFilterOnSelectedColumnOnSkinnyTable_14__strategy_BLOCKING_coordinator_2_flush_true_paging_false_/]) that I can't reproduce locally. Not sure whether they might be caused by the CI environment or there's a real problem. Increasing the request timeout doesn't help, so we could try to not so aggressively reuse the cluster. Right now 224 tests use the same cluster, working with a cluster per test like most dtests do might improve this, although that would make the test significantly slower. CC [~maedhroz] > 4.0 quality testing: Read Repair > > > Key: CASSANDRA-15977 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15977 > Project: Cassandra > Issue Type: Task > Components: Test/dtest/java, Test/unit >Reporter: Andres de la Peña >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-beta > > Time Spent: 13h 20m > Remaining Estimate: 0h > > This is a subtask of CASSANDRA-15579 focusing on read repair. > [This > document|https://docs.google.com/document/d/1-gldHcdLSMRbDhhI8ahs_tPeAZsjurjXr38xABVjWHE/edit?usp=sharing] > lists and describes the existing functional tests for read repair, so we can > have a broad view of what is currently covered. We can comment on this > document and add ideas for new cases/tests, so it can gradually evolve to a > more or less detailed test plan. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15229) Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed Chunks
[ https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214770#comment-17214770 ] Zhao Yang edited comment on CASSANDRA-15229 at 10/15/20, 3:20 PM: -- thanks for the review and feedback, merged to [trunk|https://github.com/apache/cassandra/commit/699a1f74fcc1da1952da6b2b0309c9e2474c67f4] was (Author: jasonstack): thanks for the review and feedback, merged to [trunk](https://github.com/apache/cassandra/commit/699a1f74fcc1da1952da6b2b0309c9e2474c67f4) > Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed > Chunks > > > Key: CASSANDRA-15229 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15229 > Project: Cassandra > Issue Type: Bug > Components: Local/Caching >Reporter: Benedict Elliott Smith >Assignee: Zhao Yang >Priority: Normal > Fix For: 4.0, 4.0-beta > > Attachments: 15229-count.png, 15229-direct.png, 15229-hit-rate.png, > 15229-recirculate-count.png, 15229-recirculate-hit-rate.png, > 15229-recirculate-size.png, 15229-recirculate.png, 15229-size.png, > 15229-unsafe.png > > > The BufferPool was never intended to be used for a {{ChunkCache}}, and we > need to either change our behaviour to handle uncorrelated lifetimes or use > something else. This is particularly important with the default chunk size > for compressed sstables being reduced. If we address the problem, we should > also utilise the BufferPool for native transport connections like we do for > internode messaging, and reduce the number of pooling solutions we employ. > Probably the best thing to do is to improve BufferPool’s behaviour when used > for things with uncorrelated lifetimes, which essentially boils down to > tracking those chunks that have not been freed and re-circulating them when > we run out of completely free blocks. We should probably also permit > instantiating separate {{BufferPool}}, so that we can insulate internode > messaging from the {{ChunkCache}}, or at least have separate memory bounds > for each, and only share fully-freed chunks. > With these improvements we can also safely increase the {{BufferPool}} chunk > size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce > the amount of global coordination and per-allocation overhead. We don’t need > 1KiB granularity for allocations, nor 16 byte granularity for tiny > allocations. > - > Since CASSANDRA-5863, chunk cache is implemented to use buffer pool. When > local pool is full, one of its chunks will be evicted and only put back to > global pool when all buffers in the evicted chunk are released. But due to > chunk cache, buffers can be held for long period of time, preventing evicted > chunk to be recycled even though most of space in the evicted chunk are free. > There two things need to be improved: > 1. Evicted chunk with free space should be recycled to global pool, even if > it's not fully free. It's doable in 4.0. > 2. Reduce fragmentation caused by different buffer size. With #1, partially > freed chunk will be available for allocation, but "holes" in the partially > freed chunk are with different sizes. We should consider allocating fixed > buffer size which is unlikely to fit in 4.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15229) Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed Chunks
[ https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yang updated CASSANDRA-15229: -- Source Control Link: https://github.com/apache/cassandra/commit/699a1f74fcc1da1952da6b2b0309c9e2474c67f4 (was: https://github.com/apache/cassandra/pull/535) > Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed > Chunks > > > Key: CASSANDRA-15229 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15229 > Project: Cassandra > Issue Type: Bug > Components: Local/Caching >Reporter: Benedict Elliott Smith >Assignee: Zhao Yang >Priority: Normal > Fix For: 4.0, 4.0-beta > > Attachments: 15229-count.png, 15229-direct.png, 15229-hit-rate.png, > 15229-recirculate-count.png, 15229-recirculate-hit-rate.png, > 15229-recirculate-size.png, 15229-recirculate.png, 15229-size.png, > 15229-unsafe.png > > > The BufferPool was never intended to be used for a {{ChunkCache}}, and we > need to either change our behaviour to handle uncorrelated lifetimes or use > something else. This is particularly important with the default chunk size > for compressed sstables being reduced. If we address the problem, we should > also utilise the BufferPool for native transport connections like we do for > internode messaging, and reduce the number of pooling solutions we employ. > Probably the best thing to do is to improve BufferPool’s behaviour when used > for things with uncorrelated lifetimes, which essentially boils down to > tracking those chunks that have not been freed and re-circulating them when > we run out of completely free blocks. We should probably also permit > instantiating separate {{BufferPool}}, so that we can insulate internode > messaging from the {{ChunkCache}}, or at least have separate memory bounds > for each, and only share fully-freed chunks. > With these improvements we can also safely increase the {{BufferPool}} chunk > size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce > the amount of global coordination and per-allocation overhead. We don’t need > 1KiB granularity for allocations, nor 16 byte granularity for tiny > allocations. > - > Since CASSANDRA-5863, chunk cache is implemented to use buffer pool. When > local pool is full, one of its chunks will be evicted and only put back to > global pool when all buffers in the evicted chunk are released. But due to > chunk cache, buffers can be held for long period of time, preventing evicted > chunk to be recycled even though most of space in the evicted chunk are free. > There two things need to be improved: > 1. Evicted chunk with free space should be recycled to global pool, even if > it's not fully free. It's doable in 4.0. > 2. Reduce fragmentation caused by different buffer size. With #1, partially > freed chunk will be available for allocation, but "holes" in the partially > freed chunk are with different sizes. We should consider allocating fixed > buffer size which is unlikely to fit in 4.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15229) Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed Chunks
[ https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yang updated CASSANDRA-15229: -- Resolution: Fixed Status: Resolved (was: Ready to Commit) thanks for the review and feedback, merged to [trunk](https://github.com/apache/cassandra/commit/699a1f74fcc1da1952da6b2b0309c9e2474c67f4) > Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed > Chunks > > > Key: CASSANDRA-15229 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15229 > Project: Cassandra > Issue Type: Bug > Components: Local/Caching >Reporter: Benedict Elliott Smith >Assignee: Zhao Yang >Priority: Normal > Fix For: 4.0, 4.0-beta > > Attachments: 15229-count.png, 15229-direct.png, 15229-hit-rate.png, > 15229-recirculate-count.png, 15229-recirculate-hit-rate.png, > 15229-recirculate-size.png, 15229-recirculate.png, 15229-size.png, > 15229-unsafe.png > > > The BufferPool was never intended to be used for a {{ChunkCache}}, and we > need to either change our behaviour to handle uncorrelated lifetimes or use > something else. This is particularly important with the default chunk size > for compressed sstables being reduced. If we address the problem, we should > also utilise the BufferPool for native transport connections like we do for > internode messaging, and reduce the number of pooling solutions we employ. > Probably the best thing to do is to improve BufferPool’s behaviour when used > for things with uncorrelated lifetimes, which essentially boils down to > tracking those chunks that have not been freed and re-circulating them when > we run out of completely free blocks. We should probably also permit > instantiating separate {{BufferPool}}, so that we can insulate internode > messaging from the {{ChunkCache}}, or at least have separate memory bounds > for each, and only share fully-freed chunks. > With these improvements we can also safely increase the {{BufferPool}} chunk > size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce > the amount of global coordination and per-allocation overhead. We don’t need > 1KiB granularity for allocations, nor 16 byte granularity for tiny > allocations. > - > Since CASSANDRA-5863, chunk cache is implemented to use buffer pool. When > local pool is full, one of its chunks will be evicted and only put back to > global pool when all buffers in the evicted chunk are released. But due to > chunk cache, buffers can be held for long period of time, preventing evicted > chunk to be recycled even though most of space in the evicted chunk are free. > There two things need to be improved: > 1. Evicted chunk with free space should be recycled to global pool, even if > it's not fully free. It's doable in 4.0. > 2. Reduce fragmentation caused by different buffer size. With #1, partially > freed chunk will be available for allocation, but "holes" in the partially > freed chunk are with different sizes. We should consider allocating fixed > buffer size which is unlikely to fit in 4.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch trunk updated: CASSANDRA-15229: Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed Chunks
This is an automated email from the ASF dual-hosted git repository. jasonstack pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/trunk by this push: new 699a1f7 CASSANDRA-15229: Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed Chunks 699a1f7 is described below commit 699a1f74fcc1da1952da6b2b0309c9e2474c67f4 Author: Zhao Yang AuthorDate: Thu Oct 15 22:53:44 2020 +0800 CASSANDRA-15229: Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed Chunks * initiate multiple buffer pool for different lifespan and usages - Chunk Cache Buffer Pool - conf.file_cache_size_in_mb=512mb - Networking Buffer Pool - conf.temporary_cache_size_in_mb=128mb * Add overflowSize and usedSize to buffer pool metrics * re-circulate buffer pool Chunk for ChunkCache whenever it has free space, even thoughput it may not be able to allocate due to fragmentation patch by Zhao Yang; reviewed by Caleb Rackliffe and Aleksey Yeschenko for CASSANDRA-15229 --- CHANGES.txt| 1 + conf/cassandra.yaml| 13 +- .../org/apache/cassandra/cache/ChunkCache.java | 14 +- src/java/org/apache/cassandra/config/Config.java | 2 + .../cassandra/config/DatabaseDescriptor.java | 14 + .../db/streaming/CassandraStreamWriter.java| 6 +- .../cassandra/hints/ChecksummedDataInput.java | 6 +- .../hints/CompressedChecksummedDataInput.java | 13 +- .../io/util/BufferManagingRebufferer.java | 6 +- .../cassandra/metrics/BufferPoolMetrics.java | 45 +- .../cassandra/net/AsyncStreamingOutputPlus.java| 13 +- .../apache/cassandra/net/BufferPoolAllocator.java | 13 +- .../cassandra/net/FrameDecoderLegacyLZ4.java | 11 +- .../org/apache/cassandra/net/FrameEncoder.java | 9 +- .../org/apache/cassandra/net/FrameEncoderCrc.java | 2 +- .../org/apache/cassandra/net/FrameEncoderLZ4.java | 9 +- .../cassandra/net/FrameEncoderLegacyLZ4.java | 8 +- .../cassandra/net/FrameEncoderUnprotected.java | 2 +- .../apache/cassandra/net/HandshakeProtocol.java| 6 +- .../cassandra/net/InboundConnectionInitiator.java | 6 +- .../cassandra/net/LocalBufferPoolAllocator.java| 3 +- .../cassandra/net/OutboundConnectionInitiator.java | 4 +- .../org/apache/cassandra/net/ShareableBytes.java | 6 +- .../apache/cassandra/utils/memory/BufferPool.java | 466 - .../apache/cassandra/utils/memory/BufferPools.java | 79 .../apache/cassandra/net/ConnectionBurnTest.java | 4 +- .../cassandra/utils/memory/LongBufferPoolTest.java | 111 ++--- test/data/jmxdump/cassandra-4.0-jmx.yaml | 75 +++- .../cassandra/distributed/impl/Instance.java | 4 +- .../cassandra/metrics/BufferPoolMetricsTest.java | 125 -- .../unit/org/apache/cassandra/net/FramingTest.java | 6 +- .../cassandra/utils/memory/BufferPoolTest.java | 361 +++- 32 files changed, 1067 insertions(+), 376 deletions(-) diff --git a/CHANGES.txt b/CHANGES.txt index fe3fef8..543a1cf 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0-beta3 + * Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed Chunks (CASSANDRA-15229) * Fail truncation requests when they fail on a replica (CASSANDRA-16208) * Move compact storage validation earlier in startup process (CASSANDRA-16063) * Fix ByteBufferAccessor cast exceptions are thrown when trying to query a virtual table (CASSANDRA-16155) diff --git a/conf/cassandra.yaml b/conf/cassandra.yaml index ff414ed..37b18f9 100644 --- a/conf/cassandra.yaml +++ b/conf/cassandra.yaml @@ -469,13 +469,22 @@ concurrent_counter_writes: 32 # be limited by the less of concurrent reads or concurrent writes. concurrent_materialized_view_writes: 32 +# Maximum memory to use for inter-node and client-server networking buffers. +# +# Defaults to the smaller of 1/16 of heap or 128MB. This pool is allocated off-heap, +# so is in addition to the memory allocated for heap. The cache also has on-heap +# overhead which is roughly 128 bytes per chunk (i.e. 0.2% of the reserved size +# if the default 64k chunk size is used). +# Memory is only allocated when needed. +# networking_cache_size_in_mb: 128 + # Enable the sstable chunk cache. The chunk cache will store recently accessed # sections of the sstable in-memory as uncompressed buffers. # file_cache_enabled: false # Maximum memory to use for sstable chunk cache and buffer pooling. -# 32MB of this are reserved for pooling buffers, the rest is used as an -# cache that holds uncompressed sstable chunks. +# 32MB of this are reserved for pooling buffers, the rest is used for chunk cache +# that holds uncompressed sstable chunks. # Defaults to the smaller of 1/4 of heap or
[jira] [Assigned] (CASSANDRA-16177) jvm_upgrade_dtests job issue in CircleCI MIDRES
[ https://issues.apache.org/jira/browse/CASSANDRA-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova reassigned CASSANDRA-16177: --- Assignee: Ekaterina Dimitrova > jvm_upgrade_dtests job issue in CircleCI MIDRES > --- > > Key: CASSANDRA-16177 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16177 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0-beta > > > jvm_upgrade_dtests work well in HIGHRES, but we see the following issue with > MIDRES: > https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/349/workflows/04bccc52-4e3e-41e2-9c04-93501ea4ce77/jobs/2167/steps -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16177) jvm_upgrade_dtests job issue in CircleCI MIDRES
[ https://issues.apache.org/jira/browse/CASSANDRA-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-16177: Complexity: Low Hanging Fruit (was: Normal) > jvm_upgrade_dtests job issue in CircleCI MIDRES > --- > > Key: CASSANDRA-16177 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16177 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0-beta > > > jvm_upgrade_dtests work well in HIGHRES, but we see the following issue with > MIDRES: > https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/349/workflows/04bccc52-4e3e-41e2-9c04-93501ea4ce77/jobs/2167/steps -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16177) jvm_upgrade_dtests job issue in CircleCI MIDRES
[ https://issues.apache.org/jira/browse/CASSANDRA-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214745#comment-17214745 ] Ekaterina Dimitrova commented on CASSANDRA-16177: - The issue is that number of workers shouldn't be more than the number of tests. > jvm_upgrade_dtests job issue in CircleCI MIDRES > --- > > Key: CASSANDRA-16177 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16177 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0-beta > > > jvm_upgrade_dtests work well in HIGHRES, but we see the following issue with > MIDRES: > https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/349/workflows/04bccc52-4e3e-41e2-9c04-93501ea4ce77/jobs/2167/steps -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16210) Synchronize Keyspace instance store/clear
[ https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214734#comment-17214734 ] Ekaterina Dimitrova commented on CASSANDRA-16210: - It turned out the issue is already solved for 4.0 with CASSANDRA-9425 Posting patch for [3.11 | https://github.com/ekaterinadimitrova2/cassandra/pull/59] CI run: [Java8 | https://jenkins-cm4.apache.org/job/Cassandra-devbranch/104/#showFailuresLink] > Synchronize Keyspace instance store/clear > - > > Key: CASSANDRA-16210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16210 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.x > > > Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one > random failure was reported which pointed to a race condition to be spotted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear
[ https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-16210: Test and Documentation Plan: It turned out the issue is already solved for 4.0 with CASSANDRA-9425 Posting patch for [3.11 |https://github.com/ekaterinadimitrova2/cassandra/pull/59] CI run: [Java8 |https://jenkins-cm4.apache.org/job/Cassandra-devbranch/104/#showFailuresLink] Status: Patch Available (was: In Progress) > Synchronize Keyspace instance store/clear > - > > Key: CASSANDRA-16210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16210 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.x > > > Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one > random failure was reported which pointed to a race condition to be spotted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16200) Nodetool ring unit testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-16200: - Reviewers: Brandon Williams, Brandon Williams (was: Brandon Williams) Brandon Williams, Brandon Williams Status: Review In Progress (was: Patch Available) > Nodetool ring unit testing > -- > > Key: CASSANDRA-16200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16200 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/java >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-beta > > Time Spent: 40m > Remaining Estimate: 0h > > Add nodetool ring testing -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15865) Flaky dtest hintedhandoff_test.py::TestHintedHandoffConfig::test_hintedhandoff_setmaxwindow
[ https://issues.apache.org/jira/browse/CASSANDRA-15865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214669#comment-17214669 ] Brandon Williams commented on CASSANDRA-15865: -- Go for it. :) bq. So the HH setting, along the rest of operations, are being sent to node2 imo. Sam's comment was similar, and I posted the trace above with a patch like the PR applied, since that was clearly wrong. When I dug in it looked like node2 flapped once after shutdown which was causing this. I can usually repro in a few hundred runs on j11, I'll see what happens. > Flaky dtest > hintedhandoff_test.py::TestHintedHandoffConfig::test_hintedhandoff_setmaxwindow > --- > > Key: CASSANDRA-15865 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15865 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Sam Tunnicliffe >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-beta > > Time Spent: 20m > Remaining Estimate: 0h > > I've seen this fail a couple of times under JDK11, when it doesn't appear to > be related to the changes under test. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16212) Cassandra version above 3.11.0 failing for ARM64
[ https://issues.apache.org/jira/browse/CASSANDRA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214668#comment-17214668 ] odidev commented on CASSANDRA-16212: Hi Team I am working on adding ARM64 support to ‘zipkin’. Zipkin uses the ‘zipkin-cassandra’ docker image in their build. But the image is available only for AMD64 platform. ‘Zipkin-cassandra’ is built from a Dockerfile which downloads and uses Cassandra. I have checked that Cassandra has included AArch64 support from version 3.11.X and above, and cassandra docker images are also available for ARM64 platform <[https://hub.docker.com/_/cassandra?tab=tags]>. For generating zipkin-cassandra docker images for ARM64, I have used cassandra version 3.11.0, and the image has been built fine. But if I use cassandra version above 3.11.0, say 3.11.8, then docker build fails with the below error after starting the server with “bin/cassandra -f” command: {code:java} ERROR [main] 2020-10-15 09:04:39,771 NativeLibraryLinux.java:64 - Failed to link the C library against JNA. Native methods will be unavailable. java.lang.UnsatisfiedLinkError: /tmp/jna-3506402/jna3214742498288082263.tmp: Error loading shared library ld-linux-aarch64.so.1: No such file or directory (needed by /tmp/jna-3506402/jna3214742498288082263.tmp) at java.lang.ClassLoader$NativeLibrary.load(Native Method) ~[na:1.8.0_252] at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1946) ~[na:1.8.0_252] at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1828) ~[na:1.8.0_252] at java.lang.Runtime.load0(Runtime.java:809) ~[na:1.8.0_252] at java.lang.System.load(System.java:1088) ~[na:1.8.0_252] at com.sun.jna.Native.loadNativeDispatchLibraryFromClasspath(Native.java:851) ~[jna-4.2.2.jar:4.2.2 (b0)] at com.sun.jna.Native.loadNativeDispatchLibrary(Native.java:826) ~[jna-4.2.2.jar:4.2.2 (b0)] at com.sun.jna.Native.(Native.java:140) ~[jna-4.2.2.jar:4.2.2 (b0)] at com.sun.jna.NativeLibrary.(NativeLibrary.java:84) ~[jna-4.2.2.jar:4.2.2 (b0)] at org.apache.cassandra.utils.NativeLibraryLinux.(NativeLibraryLinux.java:55) ~[apache-cassandra-3.11.8.jar:3.11.8] at org.apache.cassandra.utils.NativeLibrary.(NativeLibrary.java:95) [apache-cassandra-3.11.8.jar:3.11.8] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:203) [apache-cassandra-3.11.8.jar:3.11.8] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:628) [apache-cassandra-3.11.8.jar:3.11.8] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:768) [apache-cassandra-3.11.8.jar:3.11.8] {code} The build environment is based on the docker image *‘alpine:3.12’*, with *C.UTF-8* locale and *openjdk-8*. Here is the Dockerfile for building ‘zipkin-cassandra’ docker image <[https://github.com/openzipkin/zipkin/blob/2.21.5/docker/storage/cassandra/Dockerfile]> and please find the Dockerfile here for the base image used in zipkin-cassandra dockerfile <[https://github.com/openzipkin/docker-java/blob/1.8.0_252-b09/Dockerfile]>. There is a limitation in project ‘zipkin’, that the source code supports Cassandra version 3.11.3 and above. But if I use any other version other than 3.11.0, I get the above error that *ld-linux-aarch64.so.1* file is missing. Another constraint is to use an ‘alpine’ environment only, as building ‘zipkin-cassandra’ docker image involves installation script file, which has an alpine based coding format. For the resolution, I followed below JIRAs raised for similar issues in 3.11.X series: # https://issues.apache.org/jira/browse/CASSANDRA-13072 # https://issues.apache.org/jira/browse/CASSANDRA-13791 Accordingly, I have tried removing the jna-4.2.2 jar file from /lib and downloaded jna-4.4.0 jar; but this has not solved the problem. Also, I have downloaded ‘*ld-linux-aarch64.so.1*’ from here <[https://ughe.github.io/data/2018/ld-linux-aarch64.so.1]> and placed it at /lib/, but facing the same issue. Zipkin requires a cassandra version greater than v3.11.3 but it seems cassandra versions greater than v3.11.0 does not support ARM64 platform. It will be helpful if we have ARM64 support in current versions or please provide me with some pointers on the above issue so that I can add the same. > Cassandra version above 3.11.0 failing for ARM64 > - > > Key: CASSANDRA-16212 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16212 > Project: Cassandra > Issue Type: Task >Reporter: odidev >Priority: Normal > > Cassandra versions above 3.11.0 are failing on ARM64 platform with below > issue: > java.lang.UnsatisfiedLinkError: /tmp/jna-3506402/jna3214742498288082263.tmp: > *Error loading
[jira] [Created] (CASSANDRA-16212) Cassandra version above 3.11.0 failing for ARM64
odidev created CASSANDRA-16212: -- Summary: Cassandra version above 3.11.0 failing for ARM64 Key: CASSANDRA-16212 URL: https://issues.apache.org/jira/browse/CASSANDRA-16212 Project: Cassandra Issue Type: Task Reporter: odidev Cassandra versions above 3.11.0 are failing on ARM64 platform with below issue: java.lang.UnsatisfiedLinkError: /tmp/jna-3506402/jna3214742498288082263.tmp: *Error loading shared library ld-linux-aarch64.so.1: No such file or directory (needed by /tmp/jna-3506402/jna3214742498288082263.tmp)* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15977) 4.0 quality testing: Read Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214624#comment-17214624 ] Andres de la Peña edited comment on CASSANDRA-15977 at 10/15/20, 11:36 AM: --- [~jmckenzie] Indeed, I've rebased the branches and I'm running CI, I'll post the results once it's finished. Also, as [discussed in Slack|https://the-asf.slack.com/archives/CK23JSY2K/p1602182196170300?thread_ts=1602111667.128800=CK23JSY2K], I'm adding a new {{@ported_to_in_jvm}} Python marker for dtests, so they are only skipped without vnodes, and they are easy to identify when/if we add support for virtual nodes in-JVM. was (Author: adelapena): [~jmckenzie] Indeed, I've rebased the branches and running CI, I'll post the results once it's finished. Also, as [discussed in Slack|https://the-asf.slack.com/archives/CK23JSY2K/p1602182196170300?thread_ts=1602111667.128800=CK23JSY2K], I'm adding a new {{@ported_to_in_jvm}} Python marker for dtests, so they are only skipped without vnodes, and they are easy to identify when/if we add support for virtual nodes in-JVM. > 4.0 quality testing: Read Repair > > > Key: CASSANDRA-15977 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15977 > Project: Cassandra > Issue Type: Task > Components: Test/dtest/java, Test/unit >Reporter: Andres de la Peña >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-beta > > Time Spent: 13h 20m > Remaining Estimate: 0h > > This is a subtask of CASSANDRA-15579 focusing on read repair. > [This > document|https://docs.google.com/document/d/1-gldHcdLSMRbDhhI8ahs_tPeAZsjurjXr38xABVjWHE/edit?usp=sharing] > lists and describes the existing functional tests for read repair, so we can > have a broad view of what is currently covered. We can comment on this > document and add ideas for new cases/tests, so it can gradually evolve to a > more or less detailed test plan. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15977) 4.0 quality testing: Read Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214624#comment-17214624 ] Andres de la Peña commented on CASSANDRA-15977: --- [~jmckenzie] Indeed, I've rebased the branches and running CI, I'll post the results once it's finished. Also, as [discussed in Slack|https://the-asf.slack.com/archives/CK23JSY2K/p1602182196170300?thread_ts=1602111667.128800=CK23JSY2K], I'm adding a new {{@ported_to_in_jvm}} Python marker for dtests, so they are only skipped without vnodes, and they are easy to identify when/if we add support for virtual nodes in-JVM. > 4.0 quality testing: Read Repair > > > Key: CASSANDRA-15977 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15977 > Project: Cassandra > Issue Type: Task > Components: Test/dtest/java, Test/unit >Reporter: Andres de la Peña >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-beta > > Time Spent: 13h 20m > Remaining Estimate: 0h > > This is a subtask of CASSANDRA-15579 focusing on read repair. > [This > document|https://docs.google.com/document/d/1-gldHcdLSMRbDhhI8ahs_tPeAZsjurjXr38xABVjWHE/edit?usp=sharing] > lists and describes the existing functional tests for read repair, so we can > have a broad view of what is currently covered. We can comment on this > document and add ideas for new cases/tests, so it can gradually evolve to a > more or less detailed test plan. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15977) 4.0 quality testing: Read Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214622#comment-17214622 ] Josh McKenzie commented on CASSANDRA-15977: --- ping [~adelapena] - think both those reqs are good now > 4.0 quality testing: Read Repair > > > Key: CASSANDRA-15977 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15977 > Project: Cassandra > Issue Type: Task > Components: Test/dtest/java, Test/unit >Reporter: Andres de la Peña >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-beta > > Time Spent: 13h 20m > Remaining Estimate: 0h > > This is a subtask of CASSANDRA-15579 focusing on read repair. > [This > document|https://docs.google.com/document/d/1-gldHcdLSMRbDhhI8ahs_tPeAZsjurjXr38xABVjWHE/edit?usp=sharing] > lists and describes the existing functional tests for read repair, so we can > have a broad view of what is currently covered. We can comment on this > document and add ideas for new cases/tests, so it can gradually evolve to a > more or less detailed test plan. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15585) 4.0 quality testing: Test Frameworks, Tooling, Infra / Automation
[ https://issues.apache.org/jira/browse/CASSANDRA-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214620#comment-17214620 ] Josh McKenzie commented on CASSANDRA-15585: --- {quote}Harry running regularly seems like a good "done" condition 4.0.0 {quote} Is this you volunteering to do this work? :) > 4.0 quality testing: Test Frameworks, Tooling, Infra / Automation > - > > Key: CASSANDRA-15585 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15585 > Project: Cassandra > Issue Type: Task > Components: Test/dtest/python >Reporter: Josh McKenzie >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Jordan West* > This area refers to contributions to test frameworks/tooling (e.g., dtests, > QuickTheories, CASSANDRA-14821), and automation enabling those tools to be > applied at scale (e.g., replay testing via Spark-based replay of captured FQL > logs). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15580) 4.0 quality testing: Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214619#comment-17214619 ] Josh McKenzie commented on CASSANDRA-15580: --- Great points about the current mismatch of infra (k8s vs. else) fallout brings to the table. Agreed we shouldn't delay 4.0 on bringing that infra up to speed. There's a few other new testing frameworks that are falling into the "do one-off or point testing instead of coupling 4.0 GA to wiring these up in CI/CD" as well. Would have to dig into the code to answer the question about the specificity on targeting latest on mixed version cluster repair. trunk and previous major is probably quite fine. IMO, what you've enumerated above is a respectable set of new coverage for us to wire up that should hit the big targets we have and reduce our uncertainty in the process. > 4.0 quality testing: Repair > --- > > Key: CASSANDRA-15580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15580 > Project: Cassandra > Issue Type: Task > Components: Test/dtest/python >Reporter: Josh McKenzie >Assignee: Alexander Dejanovski >Priority: Normal > Fix For: 4.0-rc > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Alexander Dejanovski* > We aim for 4.0 to have the first fully functioning incremental repair > solution (CASSANDRA-9143)! Furthermore we aim to verify that all types of > repair: (full range, sub range, incremental) function as expected as well as > ensuring community tools such as Reaper work. CASSANDRA-3200 adds an > experimental option to reduce the amount of data streamed during repair, we > should write more tests and see how it works with big nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208057#comment-17208057 ] Josh McKenzie edited comment on CASSANDRA-15579 at 10/15/20, 11:05 AM: --- [~bdeggleston] - confirming - you still have cycles to shepherd this? Update: removing due to inactivity. was (Author: jmckenzie): [~bdeggleston] - confirming - you still have cycles to shepherd this? > 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, > and Read Repair > > > Key: CASSANDRA-15579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15579 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Josh McKenzie >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: {color:#de350b}None{color}* > Testing in this area focuses on non-node-local aspects of the read-write > path: coordination, replication, read repair, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15579) 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, and Read Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh McKenzie updated CASSANDRA-15579: -- Description: Reference [doc from NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] for context. *Shepherd: {color:#de350b}None{color}* Testing in this area focuses on non-node-local aspects of the read-write path: coordination, replication, read repair, etc. was: Reference [doc from NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] for context. *Shepherd: Blake Eggleston* Testing in this area focuses on non-node-local aspects of the read-write path: coordination, replication, read repair, etc. > 4.0 quality testing: Distributed Read/Write Path: Coordination, Replication, > and Read Repair > > > Key: CASSANDRA-15579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15579 > Project: Cassandra > Issue Type: Task > Components: Test/unit >Reporter: Josh McKenzie >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: {color:#de350b}None{color}* > Testing in this area focuses on non-node-local aspects of the read-write > path: coordination, replication, read repair, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15538) 4.0 quality testing: Local Read/Write Path: Other Areas
[ https://issues.apache.org/jira/browse/CASSANDRA-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214614#comment-17214614 ] Josh McKenzie commented on CASSANDRA-15538: --- So to summarize a checkpoint of where we are: # The scope on this ticket as described is quite large and untargated # The historical pain in the StorageEngine seems to center around LegacyLayout stuff w/CASSANDRA-8099 # We believe randomized schema testing and query combinations are one of our best ways to increase confidence in this space # Harry is well suited to this work # Harry isn't quite ready for this in terms of us coupling the 4.0 GA with this # Orthogonally, reverse queries and range tombstones have historically been a little sketchy 1-5 imply to me a subsequent iterative approach to improving our coverage there as we don't necessarily have reason to believe there's major regression between 3.0 and 4.0, or 3.11 and 4.0, in this area of the codebase. There's also been a significant raft of both real workload and real schema testing done against mixed version clusters straddling 2.1 and 3.0 so there's an argument we should be reasonably confident in the post CASSANDRA-8099 mixed version state. So an option here would be to do the following (trying to keep things moving along; not married to this): # Pre 4.0: Flesh out more testing for reverse queries and range tombstones # Pre 4.0: Selectively take a look at code coverage analysis for unit testing in this domain and look for obvious gaps and beef up unit testing there # Post 4.0 (4.0.x): incrementally work to wire up Harry, Fallout, generative cassandra-diff framework testing w/user schemas (coming soon) [~ifesdjeen] / [~aleksey]: Thoughts on ^? > 4.0 quality testing: Local Read/Write Path: Other Areas > --- > > Key: CASSANDRA-15538 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15538 > Project: Cassandra > Issue Type: Task > Components: Test/dtest/java, Test/dtest/python >Reporter: Josh McKenzie >Priority: Normal > Fix For: 4.0-beta > > > Reference [doc from > NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#] > for context. > *Shepherd: Aleksey Yeschenko* > Testing in this area refers to the local read/write path (StorageProxy, > ColumnFamilyStore, Memtable, SSTable reading/writing, etc). We are still > finding numerous bugs and issues with the 3.0 storage engine rewrite > (CASSANDRA-8099). For 4.0 we want to ensure that we thoroughly cover the > local read/write path with techniques such as property-based testing, fuzzing > ([example|http://cassandra.apache.org/blog/2018/10/17/finding_bugs_with_property_based_testing.html]), > and a source audit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15865) Flaky dtest hintedhandoff_test.py::TestHintedHandoffConfig::test_hintedhandoff_setmaxwindow
[ https://issues.apache.org/jira/browse/CASSANDRA-15865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Berenguer Blasi updated CASSANDRA-15865: Test and Documentation Plan: CI in PR. The only hints failure is unrelated and fails also on ci-cass. Status: Patch Available (was: In Progress) > Flaky dtest > hintedhandoff_test.py::TestHintedHandoffConfig::test_hintedhandoff_setmaxwindow > --- > > Key: CASSANDRA-15865 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15865 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Sam Tunnicliffe >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-beta > > Time Spent: 20m > Remaining Estimate: 0h > > I've seen this fail a couple of times under JDK11, when it doesn't appear to > be related to the changes under test. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14746) Ensure Netty Internode Messaging Refactor is Solid
[ https://issues.apache.org/jira/browse/CASSANDRA-14746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214606#comment-17214606 ] Josh McKenzie commented on CASSANDRA-14746: --- {quote}Our hope is that we can invest the time and money ahead of time instead of after the release for 4.0. {quote} There's that saying that "an ounce of prevention is worth a pound of cure" for a reason. :) I'll dig around and see if I can surface any other large-scale performance testing. While I'd like us to move the needle on this one (as you and Vinay et al's work is doing), the ruthless pragmatist in me advocates for confidence in >= the performance of previous C* versions and getting the GA out for users and us iterating. With as extensive as the changes in the MS are, the testing you all have enumerated here on top of all the other smaller cluster perf testing devs have done seems like a reasonable suite to have adequate confidence in the "no regression" stake in the ground. > Ensure Netty Internode Messaging Refactor is Solid > -- > > Key: CASSANDRA-14746 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14746 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Streaming and Messaging >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > Labels: 4.0-QA > Fix For: 4.0-beta > > > Before we release 4.0 let's ensure that the internode messaging refactor is > 100% solid. As internode messaging is naturally used in many code paths and > widely configurable we have a large number of cluster configurations and test > configurations that must be vetted. > We plan to vary the following: > * Version of Cassandra 3.0.17 vs 4.0-alpha > * Cluster sizes with *multi-dc* deployments ranging from 6 - 100 nodes > * Client request rates varying between 1k QPS and 100k QPS of varying sizes > and shapes (BATCH, INSERT, SELECT point, SELECT range, etc ...) > * Internode compression > * Internode SSL (as well as openssl vs jdk) > * Internode Coalescing options > We are looking to measure the following as appropriate: > * Latency distributions of reads and writes (lower is better) > * Scaling limit, aka maximum throughput before violating p99 latency > deadline of 10ms @ LOCAL_QUORUM, on a fixed hardware deployment for 100% > writes, 100% reads and 50-50 writes+reads (higher is better) > * Thread counts (lower is better) > * Context switches (lower is better) > * On-CPU time of tasks (higher periods without context switch is better) > * GC allocation rates / throughput for a fixed size heap (lower allocation > better) > * Streaming recovery time for a single node failure, i.e. can Cassandra > saturate the NIC > > The goal is that 4.0 should have better latency, more throughput, fewer > threads, fewer context switches, less GC allocation, and faster recovery > time. I'm putting Jason Brown as the reviewer since he implemented most of > the internode refactor. > Current collaborators driving this QA task: Dinesh Joshi, Jordan West, Joey > Lynch (Netflix), Vinay Chella (Netflix) > Owning committer(s): Jason Brown -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15865) Flaky dtest hintedhandoff_test.py::TestHintedHandoffConfig::test_hintedhandoff_setmaxwindow
[ https://issues.apache.org/jira/browse/CASSANDRA-15865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214495#comment-17214495 ] Berenguer Blasi commented on CASSANDRA-15865: - Stealing this one from [~brandon.williams] I hope you don't mid :-) I would argue {{statushandoff}}'s output is not being used as per the test's code unless I am missing sthg. In fact looking at the loop {{node}} stays set to {{node2}} as this is the last value the initial loop leaves it to. So the HH setting, along the rest of operations, are being sent to {{node2}} imo. I couldn't repro as much as I tried but I am putting a PR up. It only makes sure the HH of 1m is effective on all nodes before proceding. > Flaky dtest > hintedhandoff_test.py::TestHintedHandoffConfig::test_hintedhandoff_setmaxwindow > --- > > Key: CASSANDRA-15865 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15865 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Sam Tunnicliffe >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-beta > > Time Spent: 10m > Remaining Estimate: 0h > > I've seen this fail a couple of times under JDK11, when it doesn't appear to > be related to the changes under test. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15865) Flaky dtest hintedhandoff_test.py::TestHintedHandoffConfig::test_hintedhandoff_setmaxwindow
[ https://issues.apache.org/jira/browse/CASSANDRA-15865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Berenguer Blasi reassigned CASSANDRA-15865: --- Assignee: Berenguer Blasi (was: Brandon Williams) > Flaky dtest > hintedhandoff_test.py::TestHintedHandoffConfig::test_hintedhandoff_setmaxwindow > --- > > Key: CASSANDRA-15865 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15865 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Sam Tunnicliffe >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-beta > > Time Spent: 10m > Remaining Estimate: 0h > > I've seen this fail a couple of times under JDK11, when it doesn't appear to > be related to the changes under test. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org