[ 
https://issues.apache.org/jira/browse/HDDS-15501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Andika updated HDDS-15501:
-------------------------------
    Issue Type: Test  (was: New Feature)

> Distributed System Testing in Ozone
> -----------------------------------
>
>                 Key: HDDS-15501
>                 URL: https://issues.apache.org/jira/browse/HDDS-15501
>             Project: Apache Ozone
>          Issue Type: Test
>          Components: test
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>
> Currently, we only test Ozone using the traditional UT, IT, Acceptance Tests. 
> We had a MiniOzoneChaosCluster (fault injection testing), but it seems 
> unused. I propose to introduce a distributed system testing and proofs system 
> so that we can have the Ozone spec as the shared mental model. Some of the 
> regressions for issues like breaking majority commit contract (HDDS-15052) is 
> not detected since we don't have the spec as the source of truth. 
> Additionally  sometimes simply we use our intuitions to guide our 
> implementation and fixes.
> This is a parent task for the effort to introduce distributed system testing 
> and proofs to test the correctness of Ozone implementation (e.g. partial 
> write commit, container state transitions, quasi closed, etc).
> Distributed system testing tools:
> - Jepsen, Ellen, Maelstorm
> - Fray
> - Hypothesis (Hegel)
> - Antithesis (paid) 
> Distributed system proofs:
> - TLA+
> - Lean4
> - P framework
> Real systems
> - 3FS (https://github.com/deepseek-ai/3FS/tree/main/specs) - uses P framework
> - AWS S3 
> (https://cacm.acm.org/practice/systems-correctness-practices-at-amazon-web-services/
>  and https://p-org.github.io/P/casestudies/#case-studies)
> - etcd robustness test 
> (https://github.com/etcd-io/etcd/tree/main/tests/robustness) - Uses 
> antithesis (among other things) 
> I prefer if we can start with P framework since some storage systems already 
> used it.
> Having a real industry-recognized spec helps to instil confidence in Ozone 
> robustness.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to