Ivan Andika created HDDS-15501:
----------------------------------
Summary: Distributed System Testing in Ozone
Key: HDDS-15501
URL: https://issues.apache.org/jira/browse/HDDS-15501
Project: Apache Ozone
Issue Type: New Feature
Components: test
Reporter: Ivan Andika
Assignee: Ivan Andika
Currently, we only test Ozone using the traditional UT, IT, Acceptance Tests.
We had a MiniOzoneChaosCluster, but it seems unused. I propose to introduce a
distributed system testing and proofs system so that we can have the Ozone spec
as the shared mental model. Some of the regressions for issues like breaking
majority commit contract (HDDS-15052) is not detected since we don't have the
spec as the source of truth.
This is a parent task for the effort to introduce distributed system testing
and proofs to test the correctness of Ozone implementation (e.g. partial write
commit, container state transition, quasi closed handling, etc).
Distributed system testing tools:
- Jepsen, Ellen, Maelstorm
- Fray
- Hypothesis (Hegel)
- Antithesis (paid)
Distributed system proofs:
- TLA+
- Lean4
- P framework
Real systems
- 3FS (https://github.com/deepseek-ai/3FS/tree/main/specs) - uses P framework
- AWS S3
(https://cacm.acm.org/practice/systems-correctness-practices-at-amazon-web-services/
and https://p-org.github.io/P/casestudies/#case-studies)
I prefer if we can start with P framework since some storage systems already
used it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]