[DISCUSS] Drop Jepsen tests

Chesnay Schepler Wed, 09 Feb 2022 03:40:45 -0800

For a few years by now we had a set of Jepsen tests that verify thecorrectness of Flinks coordination layer in the case of process crashes.In the past it has indeed found issues and thus provided value to theproject, and in general the core idea of it (and Jepsen for that matter)is very sound.

However, so far we neither made attempts to make further use of Jepsen(and limited ourselves to very basic tests) nor to familiarize ourselveswith the tests/jepsen at all.As a result these tests are difficult to maintain. They (and Jepsen) arewritten in Clojure, which makes debugging, changes and upstreamingcontributions very difficult.Additionally, the tests also make use of a very complicated(Ververica-internal) terraform+ansible setup to spin up and tear downAWS machines. While it works (and is actually pretty cool), it'sdifficult to adjust because the people who wrote it have left the company.

Why I'm raising this now (and not earlier) is because so far keeping thetests running wasn't much of a problem; bump a few dependencies here andthere and we're good to go.

However, this has changed with the recent upgrade to Zookeeper 3.5,which isn't supported by Jepsen out-of-the-box, completely breaking thetests. We'd now have to write a new Zookeeper 3.5+ integration forJepsen (again, in Clojure). While I started working on that and couldlikely finish it, I started to wonder whether it even makes sense to doso, and whether we couldn't invest this time elsewhere.


Let me know what you think.

[DISCUSS] Drop Jepsen tests

Reply via email to