[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2022-02-08 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489033#comment-17489033
 ] 

Caleb Rackliffe commented on CASSANDRA-16262:
-

+1

(looks like we might need one more +1 on the harry release)

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Assignee: Alex Petrov
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.x
>
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2022-02-07 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488192#comment-17488192
 ] 

Alex Petrov commented on CASSANDRA-16262:
-

[~maedhroz] [~aratnofsky] i've pushed a new version and published harry release 
for vote. Would you be able to take another look?

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Assignee: Alex Petrov
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.x
>
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2022-01-24 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17481566#comment-17481566
 ] 

Caleb Rackliffe commented on CASSANDRA-16262:
-

Finished a first pass at review and dropped comments inline in the PR.

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Assignee: Alex Petrov
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.x
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2022-01-20 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17479683#comment-17479683
 ] 

Caleb Rackliffe commented on CASSANDRA-16262:
-

I attempted to build the branch again this afternoon:

1.) Cleared local maven cache
2.) Moved aside existing {{settings.xml}}
3.) git clean -fxd && ant realclean && ant jar && ant build-test && ant 
generate-idea-files

Got the following:

{noformat}
[retry] Attempt [0]: error occurred; retrying...
[resolver:resolve] Could not transfer metadata 
org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xml from/to 
resolver-apache-snapshots (${artifact.remoteRepository.apacheSnapshots}): 
Cannot access ${artifact.remoteRepository.apacheSnapshots} with type default 
using the available connector factories: BasicRepositoryConnectorFactory
[resolver:resolve] 
org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xmlfailed to 
transfer from ${artifact.remoteRepository.apacheSnapshots} during a previous 
attempt. This failure was cached in the local repository and resolution will 
not be reattempted until the update interval of resolver-apache-snapshots has 
elapsed or updates are forced. Original error: Could not transfer metadata 
org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xml from/to 
resolver-apache-snapshots (${artifact.remoteRepository.apacheSnapshots}): 
Cannot access ${artifact.remoteRepository.apacheSnapshots} with type default 
using the available connector factories: BasicRepositoryConnectorFactory
[resolver:resolve] 
org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xmlfailed to 
transfer from ${artifact.remoteRepository.apacheSnapshots} during a previous 
attempt. This failure was cached in the local repository and resolution will 
not be reattempted until the update interval of resolver-apache-snapshots has 
elapsed or updates are forced. Original error: Could not transfer metadata 
org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xml from/to 
resolver-apache-snapshots (${artifact.remoteRepository.apacheSnapshots}): 
Cannot access ${artifact.remoteRepository.apacheSnapshots} with type default 
using the available connector factories: BasicRepositoryConnectorFactory
[retry] Attempt [1]: error occurred; retrying...
[resolver:resolve] Could not transfer metadata 
org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xml from/to 
resolver-apache-snapshots (${artifact.remoteRepository.apacheSnapshots}): 
Cannot access ${artifact.remoteRepository.apacheSnapshots} with type default 
using the available connector factories: BasicRepositoryConnectorFactory
[resolver:resolve] 
org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xmlfailed to 
transfer from ${artifact.remoteRepository.apacheSnapshots} during a previous 
attempt. This failure was cached in the local repository and resolution will 
not be reattempted until the update interval of resolver-apache-snapshots has 
elapsed or updates are forced. Original error: Could not transfer metadata 
org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xml from/to 
resolver-apache-snapshots (${artifact.remoteRepository.apacheSnapshots}): 
Cannot access ${artifact.remoteRepository.apacheSnapshots} with type default 
using the available connector factories: BasicRepositoryConnectorFactory
[resolver:resolve] 
org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xmlfailed to 
transfer from ${artifact.remoteRepository.apacheSnapshots} during a previous 
attempt. This failure was cached in the local repository and resolution will 
not be reattempted until the update interval of resolver-apache-snapshots has 
elapsed or updates are forced. Original error: Could not transfer metadata 
org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xml from/to 
resolver-apache-snapshots (${artifact.remoteRepository.apacheSnapshots}): 
Cannot access ${artifact.remoteRepository.apacheSnapshots} with type default 
using the available connector factories: BasicRepositoryConnectorFactory
[retry] Attempt [2]: error occurred; retrying...
[resolver:resolve] Could not transfer metadata 
org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xml from/to 
resolver-apache-snapshots (${artifact.remoteRepository.apacheSnapshots}): 
Cannot access ${artifact.remoteRepository.apacheSnapshots} with type default 
using the available connector factories: BasicRepositoryConnectorFactory
[resolver:resolve] 
org.apache.cassandra:harry-core:1.0.0-SNAPSHOT/maven-metadata.xmlfailed to 
transfer from ${artifact.remoteRepository.apacheSnapshots} during a previous 
attempt. This failure was cached in the local repository and resolution will 
not be reattempted until the update interval of resolver-apache-snapshots has 
elapsed or updates are forced. Original error: Could not transfer metadata 

[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2022-01-10 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17472002#comment-17472002
 ] 

Alex Petrov commented on CASSANDRA-16262:
-

[patch|https://github.com/apache/cassandra/pull/1382]

This introduces several ways to test Cassandra with Harry. 

1. Creating unit tests using the history builder

{code}
test(new SchemaGenerators.Builder("harry")
 .partitionKeySpec(1, 5)
 .clusteringKeySpec(1, 5)
 .regularColumnSpec(1, 10)
 .generator(),
 historyBuilder -> {
 historyBuilder.nextPartition()
   .simultaneously()
   .randomOrder()
 .partitionDeletion()
 .rangeDeletion()
   .finish();
 });
{code}

2. Generating SSTables

{code}
SSTableLoadingVisitor sstableVisitor = new SSTableLoadingVisitor(run, 
1000);
 LtsVisitor visitor = new GeneratingVisitor(run, sstableVisitor);
 Set pds = new HashSet<>();
 run.tracker.onLtsStarted((lts) -> {
 pds.add(run.pdSelector.pd(lts, run.schemaSpec));
 });
 for (int i = 0; i < 1000; i++)
 visitor.visit();

 sstableVisitor.forceFlush(0);
{code}

3. "normal" Harry capabilities (i.e. generate data using visitors and validate 
them using the model).




> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Assignee: Alex Petrov
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.x
>
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2021-06-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361840#comment-17361840
 ] 

Andres de la Peña commented on CASSANDRA-16262:
---

Great, thanks!

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.x
>
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2021-06-11 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361839#comment-17361839
 ] 

Brandon Williams commented on CASSANDRA-16262:
--

I have done it.

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.x
>
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2021-06-11 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361835#comment-17361835
 ] 

Alex Petrov commented on CASSANDRA-16262:
-

Sure, I'm good with this!

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.0-rc
>
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2021-06-11 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361833#comment-17361833
 ] 

Adam Holmberg commented on CASSANDRA-16262:
---

I agree. I think it's time to re-assign or clear things that we are not posing 
as blockers.

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.0-rc
>
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2021-06-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361673#comment-17361673
 ] 

Andres de la Peña commented on CASSANDRA-16262:
---

I think we should, given that we are (hopefully) very close to GA. If it gets 
ready in time for GA we can always include it, but by now I will move it to 4.x 
and out of CASSANDRA-15579, so we can close the latter. wdyt?

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.0-rc
>
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2021-06-09 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360193#comment-17360193
 ] 

Caleb Rackliffe commented on CASSANDRA-16262:
-

Should we move this to 4.x and close its parent, CASSANDRA-15579, which now 
depends only on this issue?

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.0-rc
>
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2021-04-23 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17330852#comment-17330852
 ] 

Alex Petrov commented on CASSANDRA-16262:
-

[~adelapena] I'd like to merge as soon as it is possible of course, but at the 
same time I'd like it to be useful, so I'll take time and will avoid blocking 
RC. If I manage it in time for RC - even better. That said, it's largely done, 
just need to wrap up the runner DSL.

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.0-rc
>
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2021-04-22 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17329450#comment-17329450
 ] 

Adam Holmberg commented on CASSANDRA-16262:
---

That's great to hear. Thanks for the heads up, insights, and all your work on 
that.

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.0-rc
>
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2021-04-22 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17329281#comment-17329281
 ] 

Andres de la Peña commented on CASSANDRA-16262:
---

That is great news! Is the plan merging that fuzz testing kit in 4.0-rc? Should 
we keep this under CASSANDRA-15579 ?

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.0-rc
>
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2021-04-22 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17329209#comment-17329209
 ] 

Alex Petrov commented on CASSANDRA-16262:
-

We've done over 700 hours of fuzz testing with Harry (this is a very 
conservative estimate, with all re-runs I believe it was even more than this), 
most of the tests with 5-node clusters, some with 4.0 only, and some mixed 
3.0/4.0. 

Clusters didn't have that much data (mostly under 10Gb), but I still have a 
good feeling about it, since we've exercised a lot of combinations: different 
schemas (with simple and composite partition keys, with and without static 
columns, with ASC and DESC clustering keys, and with different types for 
values), SELECT queries with read-repair, paging, ASC/DESC queries, made sure 
to include different kinds of deletions (range tombstones, partition deletions, 
row deletions), and ran tests with incremental repair, and repaired data 
tracking enabled.

Harry is now available as an artefact, and can be used as a library: 
https://repository.apache.org/content/repositories/snapshots/org/apache/cassandra/harry-core/
https://repository.apache.org/content/repositories/snapshots/org/apache/cassandra/harry-integration/

I will keep this ticket open until a small fuzz testing kit is merged into 
trunk, but I think it's fair to say that fuzz testing prerequisite for 
coordination and replication is fulfilled. 
cc [~cscotta] [~adelapena] [~blerer] [~aholmber]

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.0-rc
>
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2021-03-25 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308551#comment-17308551
 ] 

Alex Petrov commented on CASSANDRA-16262:
-

[~cscotta] sure. In fact, I've started working to make this possible. The first 
pull request is to Harry and it is here: 
https://github.com/apache/cassandra-harry/pull/7

There are several more things that I'd like to finish before we can start 
testing in earnest. On Harry side we only need to implement partition deletions 
(which is mostly ready), and implement statics (slightly more involved change, 
since it requires us to separate partition-level updates from row-level ones 
for efficiency). Other than that Harry tooling is adequate in my opinion. 

After this, we only need to hook up Harry into Cassandra tests, and add a 
simple QT-like DSL for generating schema and data, and then making checks and 
validations, and start running these tests for several hundred hours at very 
least, preferably more. FWIW, we do not strictly have to hook it up to 
Cassandra codebase, but I think it'll make it more accessible, since most 
people are already familiar with how to run Cassandra tests.

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.0-rc
>
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2021-03-20 Thread C. Scott Andreas (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305518#comment-17305518
 ] 

C. Scott Andreas commented on CASSANDRA-16262:
--

Thanks [~ifesdjeen].

Do you have a sense of what specific scope we should cover under this ticket, 
and/or what a definition of done might look like?

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.0-rc
>
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2021-02-11 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283036#comment-17283036
 ] 

Alex Petrov commented on CASSANDRA-16262:
-

Given latest data points I would strongly advise against excluding this ticket 
from 4.0. Running Harry for a relatively short amount of time, we've been able 
to hit at three issues: 
 * Group By in-jvm paging issue: 
https://issues.apache.org/jira/browse/CASSANDRA-16427
 * Group By breaks range tombstone closer: 
https://issues.apache.org/jira/browse/CASSANDRA-16431
 * Reverse iteration + paging: 
https://issues.apache.org/jira/browse/CASSANDRA-16435 

I think we would've hit the first one without a fuzz tool, since it was a 
relatively obvious one, but looking at the output from Harry it was almost 
immediately clear what's going on, so I still consider its output useful.

Amount of human labour involved into producing scenarios that would trigger the 
issues such as the other two is quite significant. As of now, we can't even 
continue further testing of group by or reverse iteration with paging because 
we're constantly hitting these two issues. It can be that we won't hit any 
other ones, but I think we should at least exhaust the ability of current 
generators and models to find bugs, especially given they are rather simple 
compared to what we'd like to achieve in the future, and they're already 
available.

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Priority: Normal
> Fix For: 4.0-rc
>
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2021-01-28 Thread Jeremiah Jordan (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17274013#comment-17274013
 ] 

Jeremiah Jordan commented on CASSANDRA-16262:
-

If we think that we have covered testing things in the face of operational 
events, then I would be ok with that.  But this is very important:

bq. What remains is verifying the distributed read and write paths in the face 
of common operational events, namely node restarts, bootstrapping, 
decommission, and cleanup.

So if we are not covering testing of those operational events then I would be 
against skipping this.

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Priority: Normal
> Fix For: 4.0-rc
>
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2021-01-28 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273891#comment-17273891
 ] 

Caleb Rackliffe commented on CASSANDRA-16262:
-

I was chatting w/ [~adelapena] the other day, and it feels like there's an 
argument for allowing 4.0 to release without this work being complete. We've 
certainly come a long way w/ CASSANDRA-15579 already, filling in a number of 
gaps that existed.

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Priority: Normal
> Fix For: 4.0-rc
>
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2020-12-09 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246824#comment-17246824
 ] 

David Capwell commented on CASSANDRA-16262:
---

Linked this with CASSANDRA-15588 as this is a similar need.

For the next week or so I plan to flesh out what exists in upgrade testing and 
what could be lacking, once this is done I want to help bootstrap this type of 
testing with a focus on upgrade path.

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Priority: Normal
> Fix For: 4.0-rc
>
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, decommission, 
> and cleanup. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16262) 4.0 Quality: Coordination & Replication Fuzz Testing

2020-11-11 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230177#comment-17230177
 ] 

Caleb Rackliffe commented on CASSANDRA-16262:
-

Note: See CASSANDRA-16213 for an example of bootstrapping and host replacement 
with gossip enabled in an in-JVM test.

> 4.0 Quality: Coordination & Replication Fuzz Testing
> 
>
> Key: CASSANDRA-16262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16262
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/fuzz
>Reporter: Caleb Rackliffe
>Priority: Normal
> Fix For: 4.0-rc
>
>
> CASSANDRA-16180, CASSANDRA-16181, and CASSANDRA-15977 have largely focused on 
> auditing the existing tests around coordination, replication, and 
> read-repair, respectively. We've expanded existing test cases, added coverage 
> around components that we've refactored along the way, and added in-JVM dtest 
> upgrade tests where possible.
> What remains is verifying the distributed read and write paths in the face of 
> common operational events, namely node restarts, bootstrapping, and 
> decommission. If we can find a way to simulate these events, 
> [Harry|https://github.com/apache/cassandra-harry] seems like a good candidate 
> to host the verification logic itself.
> To keep things simple initially, I would propose that we start by testing 
> simple read-only and write-only workloads (the former without read repair).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org