[
https://issues.apache.org/jira/browse/HDDS-15501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18087123#comment-18087123
]
Ivan Andika commented on HDDS-15501:
------------------------------------
[~henrybear327] Since I see that you have been working for etcd robustness
testing for a while, I hope you can give us your two cents on this.
> Distributed System Testing in Ozone
> -----------------------------------
>
> Key: HDDS-15501
> URL: https://issues.apache.org/jira/browse/HDDS-15501
> Project: Apache Ozone
> Issue Type: New Feature
> Components: test
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
>
> Currently, we only test Ozone using the traditional UT, IT, Acceptance Tests.
> We had a MiniOzoneChaosCluster, but it seems unused. I propose to introduce a
> distributed system testing and proofs system so that we can have the Ozone
> spec as the shared mental model. Some of the regressions for issues like
> breaking majority commit contract (HDDS-15052) is not detected since we don't
> have the spec as the source of truth and sometimes simply uses intuitions to
> guide our implementation.
> This is a parent task for the effort to introduce distributed system testing
> and proofs to test the correctness of Ozone implementation (e.g. partial
> write commit, container state transitions, quasi closed, etc).
> Distributed system testing tools:
> - Jepsen, Ellen, Maelstorm
> - Fray
> - Hypothesis (Hegel)
> - Antithesis (paid)
> Distributed system proofs:
> - TLA+
> - Lean4
> - P framework
> Real systems
> - 3FS (https://github.com/deepseek-ai/3FS/tree/main/specs) - uses P framework
> - AWS S3
> (https://cacm.acm.org/practice/systems-correctness-practices-at-amazon-web-services/
> and https://p-org.github.io/P/casestudies/#case-studies)
> - etcd robustness test
> (https://github.com/etcd-io/etcd/tree/main/tests/robustness) - Uses
> antithesis (among other things)
> I prefer if we can start with P framework since some storage systems already
> used it.
> Having a real industry-recognized spec helps to put confidence
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]