Thanks for starting the discussion Sean! Would really like to know what do folks think about this. I think most of the magic of our hbck tool is left un-appreciated because of the lack of the proof of correctness that we can provide along in the form of a constructive “destruction” tool that can be standalone one or something that takes in a cluster id/zk quorum and do the same on the cluster.
While trying to test out our operator tools rc, this is one of the friction that I faced, which I think many other enthusiasts would have probably faced. I think for starters, there could be a doc that could just list out the steps for each of our hbck commands that would bring the cluster in a state from where hbck could take it further! A tool to follow up would be a great addition. -Sakthi On Tue, Oct 1, 2019 at 2:11 AM Sean Busbey <[email protected]> wrote: > I was chatting with Sakthi about automating some testing of hbck2 commands. > Nothing too fancy, I just want some assurance that they ought to work. > > This got us talking about how we might purposefully break a cluster to meet > a set of symptoms that hbck2 knows how to correct. We need something > different from the chaos monkeys. in this case we're not trying to peturb > the cluster in ways we think it should handle; we're setting up a state we > already know requires an outside tool. > > Where should this kind of tooling live? Main repo next to the monkeys? > Alongside hbck2 in operator tools? Somewhere else entirely? >
