Hi Robert, also many thanks from my side for writing all this down! There are in fact some integration tests coming with CouchDB using the HTTP API (can be run via `make javascript`) which need to be transferred to run against the cluster port (instead of local). I started working on this on my fork, but there's quite some work left.
Maybe that's also a part of the puzzle besides having more unit tests or leveraging PouchDB et all... Best Sebastian On Tue, Sep 8, 2015 at 11:38 AM, Jason Smith <jason.h.sm...@gmail.com> wrote: > Hi, Robert. Well, that was very long and I read every word! Thank you for > your thoughtful assessment and feedback! > > You make many points across several different areas of concern. It sounds > to me like the primary thrust of your argument is about continuous > integration--somehow making it more easy and more transparent for > developers (seasoned and novice) to see the effects of their change. Is > that right? > > So, what is the immediate next step that you propose to fix or improve? > What is the most squeaky wheel, or the most modest gain that we can make? > > Thanks again! > > On Mon, Sep 7, 2015 at 9:49 PM, Robert Kowalski <r...@kowalski.gd> wrote: > > > Hi list, > > > > I had my first real programming job in a Web Agency. We were building > > mobile websites for our external customers. The development process > > followed a strict waterfall model, after you finished your project it was > > thrown over the fence to the QA folks, which tested the website manually. > > At some point, when all found bugs were fixed, the product got released. > > Sometimes a bugfix took several iterations of going back and forth > between > > QA and dev. Sometimes the QA folks forgot to test if the bug was really > > fixed after the third iteration as humans make errors, especially when it > > is a boring task. > > > > After a lifetime of 2 years, the customer usually wanted a redesign, and > > the next developer throwed everything of the old website away, as not > much > > of it was reusable. When something broke during the life of such an > > application/website, the company and developers noticed that by the > > customer calling them. > > > > These days I think often about my time in my first job and the customers > > calling the company regarding bugs. Right now the Erlang core of CouchDB > > reminds me quite often how our customers called us when something broke. > > For CouchDB 2 it is not the users that call us right now, but PouchDB and > > Fauxton when their testsuites gets red or incredible slow. > > > > When I was starting to work on Fauxton we were in a somehow similar > > situation, a merge usually had 2-3 fixes on the same or next day, because > > something somehow broke. A proper review for a change took incredible > much > > time and every pull request needed a lot of iterations. The most evil > > changes were the ones where you fixed a bug in one area, and a completely > > different area broke. These bugs were so evil because they were > surprising > > and unexpected. > > > > After a real big issue we tried out that every change would need unit > > and/or integration tests for several weeks. > > > > Writing unit tests was very hard these days, most of the code was not > > designed to get tested afterwards. So most of the tests written in that > > time were Selenium based integration tests. This was better than nothing > > but they were slow. Nevertheless, they caught a lot of bugs, even before > > code was sent to the review. When we made a switch to React and a > different > > architecture for our components and their communication, better and > easier > > unit/functional testing was one of the many design goals. > > > > Writing good Selenium tests is incredible hard, as you basically face the > > same problems you have in the distributed systems world in the world of > > modern UIs. State and data changes asynchronously and you never know when > > the result of your request, which changes the UI is returned. It even > gets > > harder by putting this on top of an eventually consistent system :) > > > > Sometimes we also pair-programmed in a video call in front of a test > where > > coming up with a good test was the problem, as we also had to share and > > gain knowledge in the team how to write tests. One mistake we made was of > > not fixing flaky tests immediately and adding more and more flaky tests > and > > just restarting the CI for a few weeks. At some point our CI system just > > collapsed and it took the team a long time to fix those, partly also > > because tests were a new topic and it is hard to understand why Selenium > > tests fail in an async JS application with a cluster beneath it which is > > eventually consistent. Additionally it decreased the confidence in a red > > tests signaling a bug. > > > > After fixing the testsuite in this early crisis we've got pretty stable > > tests. Additionally reviews take much less time. Reviews got even faster > > when we added a pre-check for our coding guidelines to the testsuite. If > > you take a look at a day in the time when we began writing tests, we > > sometimes lost an hour to write those tests, because it was hard. Looking > > just at this short timeframe it felt like we were losing time. But if we > > take a look at a longer timeframe we did not decrease in our speed to > > deliver features for several reasons: less bugfixes needed after code got > > on master, less tickets and communication because of less bugs, faster > > reviews because many bugs are caught before the reviewer takes a look and > > so on. Some people in the team still hate writing tests but also everyone > > on the team realized and also mentions that we have no other options. > > > > Right now I am seeing the same pattern we observed some time ago in > > Fauxton for CouchDB's Erlang core : > > > > - not noticing something broke (if you break it, will you even notice?) > > - eventually a user in the broader sense notices something broke > > - a lot of patching to fix patches where stuff broke in another place > > - regressions > > - it takes time to review code properly because of a large api surface > > and a lot of functionality > > > > I would like to propose that every PR needs unit, functional AND/OR > > integration tests for CouchDB's Erlang core, too - given it is not > covered > > by an existing test that run. Maybe we could try the new approach for 3 > > months and then decide if we keep it. > > > > Most of CouchDB's bugs are getting noticed by interacting with the HTTP > > API. A simple integration test that CouchDB compiles and the bug got > fixed > > / the feature works would be enough as a minimum. > > > > Let's take a look what we have for testing now: > > > > - Erlang unit tests > > - Erlang functional tests > > - integration tests written in Erlang, which use ibrowse [1] > > > > - integration tests in JavaScript which test the backdoor ports > > > > - Fauxton's Integration tests simulating a user and their browsers > > - PouchDB's integration tests > > > > So there are a lot of different ways to write tests. One problem for > > testing is our multi-repository approach. The multiple PRs over different > > repos must somehow get united again. Here is how we could solve it as a > > first step [2]. The testsuite is kicked off once all PRs are opened by > > putting the branchnames for the repositories that should not be master > into > > the config fields. > > > > When the testsuite is green another button could appear, to automate the > > merge of all repos involved int he change, but that would we an > additional > > feature [3]. > > > > The reviewer and the submitter of the PR would be both responsible that a > > merge is covered by tests and the testsuite is green. > > > > Important is that it is a first step and depending how creative we get it > > can indefinitely improve, like with Fauxton where a lot changed in the > very > > short timeframe since we are requiring tests. The important step is that > we > > somehow start at some point, and everybody is welcome and invited (and > > should feel empowered) to make the processes how we write software and > > tests better. > > > > Flaky tests are a problem and should get fixed immediately, as an > > unreliable testsuite does not make much sense (remember the crying wolf > kid > > story). We learned that quite painfully in Fauxton. > > > > The change will need some discipline and maybe also stepping temporarily > > out of our comfort zone, but I am certain we will gain speed, improve our > > daily workflows and improve CouchDB's Quality over the long term. I > think a > > 3 month timeframe to try the new approach would be a good time to find > out > > if we are improving or things get worse. > > > > > > Best, > > Robert :) > > > > > > [1] > > > https://github.com/apache/couchdb-chttpd/blob/master/test/chttpd_db_test.erl#L67 > > [2] https://cloudup.com/cOgxRPbt9aP > > [3] https://cloudup.com/c0VJDIoYqmI > > >