[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489033#comment-17489033 ] Caleb Rackliffe commented on CASSANDRA-16262: - +1 (looks like we might need one more +1 on the harry release) > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Assignee: Alex Petrov >Priority: Normal > Labels: pull-request-available > Fix For: 4.x > > Time Spent: 10h 10m > Remaining Estimate: 0h > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488192#comment-17488192 ] Alex Petrov commented on CASSANDRA-16262: - [~maedhroz] [~aratnofsky] i've pushed a new version and published harry release for vote. Would you be able to take another look? > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Assignee: Alex Petrov >Priority: Normal > Labels: pull-request-available > Fix For: 4.x > > Time Spent: 9.5h > Remaining Estimate: 0h > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17481566#comment-17481566 ] Caleb Rackliffe commented on CASSANDRA-16262: - Finished a first pass at review and dropped comments inline in the PR. > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Assignee: Alex Petrov >Priority: Normal > Labels: pull-request-available > Fix For: 4.x > > Time Spent: 5h 40m > Remaining Estimate: 0h > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17479683#comment-17479683 ] Caleb Rackliffe commented on CASSANDRA-16262: - I attempted to build the branch again this afternoon: 1.) Cleared local maven cache 2.) Moved aside existing {{settings.xml}} 3.) git clean -fxd && ant realclean && ant jar && ant build-test && ant generate-idea-files Got the following: {noformat} [retry] Attempt [0]: error occurred; retrying... [resolver:resolve] Could not transfer metadata org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xml from/to resolver-apache-snapshots (${artifact.remoteRepository.apacheSnapshots}): Cannot access ${artifact.remoteRepository.apacheSnapshots} with type default using the available connector factories: BasicRepositoryConnectorFactory [resolver:resolve] org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xmlfailed to transfer from ${artifact.remoteRepository.apacheSnapshots} during a previous attempt. This failure was cached in the local repository and resolution will not be reattempted until the update interval of resolver-apache-snapshots has elapsed or updates are forced. Original error: Could not transfer metadata org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xml from/to resolver-apache-snapshots (${artifact.remoteRepository.apacheSnapshots}): Cannot access ${artifact.remoteRepository.apacheSnapshots} with type default using the available connector factories: BasicRepositoryConnectorFactory [resolver:resolve] org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xmlfailed to transfer from ${artifact.remoteRepository.apacheSnapshots} during a previous attempt. This failure was cached in the local repository and resolution will not be reattempted until the update interval of resolver-apache-snapshots has elapsed or updates are forced. Original error: Could not transfer metadata org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xml from/to resolver-apache-snapshots (${artifact.remoteRepository.apacheSnapshots}): Cannot access ${artifact.remoteRepository.apacheSnapshots} with type default using the available connector factories: BasicRepositoryConnectorFactory [retry] Attempt [1]: error occurred; retrying... [resolver:resolve] Could not transfer metadata org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xml from/to resolver-apache-snapshots (${artifact.remoteRepository.apacheSnapshots}): Cannot access ${artifact.remoteRepository.apacheSnapshots} with type default using the available connector factories: BasicRepositoryConnectorFactory [resolver:resolve] org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xmlfailed to transfer from ${artifact.remoteRepository.apacheSnapshots} during a previous attempt. This failure was cached in the local repository and resolution will not be reattempted until the update interval of resolver-apache-snapshots has elapsed or updates are forced. Original error: Could not transfer metadata org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xml from/to resolver-apache-snapshots (${artifact.remoteRepository.apacheSnapshots}): Cannot access ${artifact.remoteRepository.apacheSnapshots} with type default using the available connector factories: BasicRepositoryConnectorFactory [resolver:resolve] org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xmlfailed to transfer from ${artifact.remoteRepository.apacheSnapshots} during a previous attempt. This failure was cached in the local repository and resolution will not be reattempted until the update interval of resolver-apache-snapshots has elapsed or updates are forced. Original error: Could not transfer metadata org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xml from/to resolver-apache-snapshots (${artifact.remoteRepository.apacheSnapshots}): Cannot access ${artifact.remoteRepository.apacheSnapshots} with type default using the available connector factories: BasicRepositoryConnectorFactory [retry] Attempt [2]: error occurred; retrying... [resolver:resolve] Could not transfer metadata org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xml from/to resolver-apache-snapshots (${artifact.remoteRepository.apacheSnapshots}): Cannot access ${artifact.remoteRepository.apacheSnapshots} with type default using the available connector factories: BasicRepositoryConnectorFactory [resolver:resolve] org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xmlfailed to transfer from ${artifact.remoteRepository.apacheSnapshots} during a previous attempt. This failure was cached in the local repository and resolution will not be reattempted until the update interval of resolver-apache-snapshots has elapsed or updates are forced. Original error: Could not transfer metadata
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17472002#comment-17472002 ] Alex Petrov commented on CASSANDRA-16262: - [patch|https://github.com/apache/cassandra/pull/1382] This introduces several ways to test Cassandra with Harry. 1. Creating unit tests using the history builder {code} test(new SchemaGenerators.Builder("harry") .partitionKeySpec(1, 5) .clusteringKeySpec(1, 5) .regularColumnSpec(1, 10) .generator(), historyBuilder -> { historyBuilder.nextPartition() .simultaneously() .randomOrder() .partitionDeletion() .rangeDeletion() .finish(); }); {code} 2. Generating SSTables {code} SSTableLoadingVisitor sstableVisitor = new SSTableLoadingVisitor(run, 1000); LtsVisitor visitor = new GeneratingVisitor(run, sstableVisitor); Set pds = new HashSet<>(); run.tracker.onLtsStarted((lts) -> { pds.add(run.pdSelector.pd(lts, run.schemaSpec)); }); for (int i = 0; i < 1000; i++) visitor.visit(); sstableVisitor.forceFlush(0); {code} 3. "normal" Harry capabilities (i.e. generate data using visitors and validate them using the model). > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Assignee: Alex Petrov >Priority: Normal > Labels: pull-request-available > Fix For: 4.x > > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361840#comment-17361840 ] Andres de la Peña commented on CASSANDRA-16262: --- Great, thanks! > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Assignee: Alex Petrov >Priority: Normal > Fix For: 4.x > > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361839#comment-17361839 ] Brandon Williams commented on CASSANDRA-16262: -- I have done it. > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Assignee: Alex Petrov >Priority: Normal > Fix For: 4.x > > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361835#comment-17361835 ] Alex Petrov commented on CASSANDRA-16262: - Sure, I'm good with this! > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Assignee: Alex Petrov >Priority: Normal > Fix For: 4.0-rc > > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361833#comment-17361833 ] Adam Holmberg commented on CASSANDRA-16262: --- I agree. I think it's time to re-assign or clear things that we are not posing as blockers. > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Assignee: Alex Petrov >Priority: Normal > Fix For: 4.0-rc > > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361673#comment-17361673 ] Andres de la Peña commented on CASSANDRA-16262: --- I think we should, given that we are (hopefully) very close to GA. If it gets ready in time for GA we can always include it, but by now I will move it to 4.x and out of CASSANDRA-15579, so we can close the latter. wdyt? > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Assignee: Alex Petrov >Priority: Normal > Fix For: 4.0-rc > > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360193#comment-17360193 ] Caleb Rackliffe commented on CASSANDRA-16262: - Should we move this to 4.x and close its parent, CASSANDRA-15579, which now depends only on this issue? > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Assignee: Alex Petrov >Priority: Normal > Fix For: 4.0-rc > > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17330852#comment-17330852 ] Alex Petrov commented on CASSANDRA-16262: - [~adelapena] I'd like to merge as soon as it is possible of course, but at the same time I'd like it to be useful, so I'll take time and will avoid blocking RC. If I manage it in time for RC - even better. That said, it's largely done, just need to wrap up the runner DSL. > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Assignee: Alex Petrov >Priority: Normal > Fix For: 4.0-rc > > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17329450#comment-17329450 ] Adam Holmberg commented on CASSANDRA-16262: --- That's great to hear. Thanks for the heads up, insights, and all your work on that. > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Assignee: Alex Petrov >Priority: Normal > Fix For: 4.0-rc > > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17329281#comment-17329281 ] Andres de la Peña commented on CASSANDRA-16262: --- That is great news! Is the plan merging that fuzz testing kit in 4.0-rc? Should we keep this under CASSANDRA-15579 ? > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Assignee: Alex Petrov >Priority: Normal > Fix For: 4.0-rc > > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17329209#comment-17329209 ] Alex Petrov commented on CASSANDRA-16262: - We've done over 700 hours of fuzz testing with Harry (this is a very conservative estimate, with all re-runs I believe it was even more than this), most of the tests with 5-node clusters, some with 4.0 only, and some mixed 3.0/4.0. Clusters didn't have that much data (mostly under 10Gb), but I still have a good feeling about it, since we've exercised a lot of combinations: different schemas (with simple and composite partition keys, with and without static columns, with ASC and DESC clustering keys, and with different types for values), SELECT queries with read-repair, paging, ASC/DESC queries, made sure to include different kinds of deletions (range tombstones, partition deletions, row deletions), and ran tests with incremental repair, and repaired data tracking enabled. Harry is now available as an artefact, and can be used as a library: https://repository.apache.org/content/repositories/snapshots/org/apache/cassandra/harry-core/ https://repository.apache.org/content/repositories/snapshots/org/apache/cassandra/harry-integration/ I will keep this ticket open until a small fuzz testing kit is merged into trunk, but I think it's fair to say that fuzz testing prerequisite for coordination and replication is fulfilled. cc [~cscotta] [~adelapena] [~blerer] [~aholmber] > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Assignee: Alex Petrov >Priority: Normal > Fix For: 4.0-rc > > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308551#comment-17308551 ] Alex Petrov commented on CASSANDRA-16262: - [~cscotta] sure. In fact, I've started working to make this possible. The first pull request is to Harry and it is here: https://github.com/apache/cassandra-harry/pull/7 There are several more things that I'd like to finish before we can start testing in earnest. On Harry side we only need to implement partition deletions (which is mostly ready), and implement statics (slightly more involved change, since it requires us to separate partition-level updates from row-level ones for efficiency). Other than that Harry tooling is adequate in my opinion. After this, we only need to hook up Harry into Cassandra tests, and add a simple QT-like DSL for generating schema and data, and then making checks and validations, and start running these tests for several hundred hours at very least, preferably more. FWIW, we do not strictly have to hook it up to Cassandra codebase, but I think it'll make it more accessible, since most people are already familiar with how to run Cassandra tests. > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Assignee: Alex Petrov >Priority: Normal > Fix For: 4.0-rc > > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305518#comment-17305518 ] C. Scott Andreas commented on CASSANDRA-16262: -- Thanks [~ifesdjeen]. Do you have a sense of what specific scope we should cover under this ticket, and/or what a definition of done might look like? > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Assignee: Alex Petrov >Priority: Normal > Fix For: 4.0-rc > > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283036#comment-17283036 ] Alex Petrov commented on CASSANDRA-16262: - Given latest data points I would strongly advise against excluding this ticket from 4.0. Running Harry for a relatively short amount of time, we've been able to hit at three issues: * Group By in-jvm paging issue: https://issues.apache.org/jira/browse/CASSANDRA-16427 * Group By breaks range tombstone closer: https://issues.apache.org/jira/browse/CASSANDRA-16431 * Reverse iteration + paging: https://issues.apache.org/jira/browse/CASSANDRA-16435 I think we would've hit the first one without a fuzz tool, since it was a relatively obvious one, but looking at the output from Harry it was almost immediately clear what's going on, so I still consider its output useful. Amount of human labour involved into producing scenarios that would trigger the issues such as the other two is quite significant. As of now, we can't even continue further testing of group by or reverse iteration with paging because we're constantly hitting these two issues. It can be that we won't hit any other ones, but I think we should at least exhaust the ability of current generators and models to find bugs, especially given they are rather simple compared to what we'd like to achieve in the future, and they're already available. > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Priority: Normal > Fix For: 4.0-rc > > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17274013#comment-17274013 ] Jeremiah Jordan commented on CASSANDRA-16262: - If we think that we have covered testing things in the face of operational events, then I would be ok with that. But this is very important: bq. What remains is verifying the distributed read and write paths in the face of common operational events, namely node restarts, bootstrapping, decommission, and cleanup. So if we are not covering testing of those operational events then I would be against skipping this. > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Priority: Normal > Fix For: 4.0-rc > > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273891#comment-17273891 ] Caleb Rackliffe commented on CASSANDRA-16262: - I was chatting w/ [~adelapena] the other day, and it feels like there's an argument for allowing 4.0 to release without this work being complete. We've certainly come a long way w/ CASSANDRA-15579 already, filling in a number of gaps that existed. > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Priority: Normal > Fix For: 4.0-rc > > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246824#comment-17246824 ] David Capwell commented on CASSANDRA-16262: --- Linked this with CASSANDRA-15588 as this is a similar need. For the next week or so I plan to flesh out what exists in upgrade testing and what could be lacking, once this is done I want to help bootstrap this type of testing with a focus on upgrade path. > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Priority: Normal > Fix For: 4.0-rc > > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, decommission, > and cleanup. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing
[ https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230177#comment-17230177 ] Caleb Rackliffe commented on CASSANDRA-16262: - Note: See CASSANDRA-16213 for an example of bootstrapping and host replacement with gossip enabled in an in-JVM test. > 4.0 Quality: Coordination & Replication Fuzz Testing > > > Key: CASSANDRA-16262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16262 > Project: Cassandra > Issue Type: Task > Components: Test/fuzz >Reporter: Caleb Rackliffe >Priority: Normal > Fix For: 4.0-rc > > > CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on > auditing the existing tests around coordination, replication, and > read-repair, respectively. We've expanded existing test cases, added coverage > around components that we've refactored along the way, and added in-JVM dtest > upgrade tests where possible. > What remains is verifying the distributed read and write paths in the face of > common operational events, namely node restarts, bootstrapping, and > decommission. If we can find a way to simulate these events, > [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate > to host the verification logic itself. > To keep things simple initially, I would propose that we start by testing > simple read-only and write-only workloads (the former without read repair). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org