Hi everyone,

I'm +/-0 on this only because there's a little ambiguity in steps 2 and 4
I'd like to clear up. This email is part test status report and
part clarification, so I apologize in advance for the length.

It is absolutely _almost_ time we get 2.1 out the door.

Step 2 is the equivalent of sweeping all our possible problems under
the rug. The failing tests aren't necessarily failing because we have
a bad test suite. In fact, just last week I found a genuine race
condition leading to a broken Couch from one of these test cases[1].
I don't want to just sweep everything under the rug to get a release
out the door like we did for 2.0.0; if we'd held on for a few more weeks
for that release we might have found and fixed that bug (and a few 
others, too.)

It's worth noting that we can't disable /all/ of the failing tests for
a 2.1 release either; at least one of the failures can best be described
as "couchjs just sometimes segfaults." So unless we're ready to just
disable the entire JS test suite... ;) And for the detractors out there,
there are more EUnit than JS failing test cases right now (13 vs. 6)!

Step 4, for me, *must* include re-enabling all of the failing tests as
soon as possible (or, alternately, only disabling them on the 2.1.x
branch.) A PR I intend to land tomorrow, which has +1s from Paul and
Jan[2], will upload couch.log files from Travis and Jenkins when a test
fails to a central CouchDB for further analysis. Prior to this,
determining the actual failure required getting lucky and having one of
the tests fail on your machine. With the exception of the compression
daemon tests (which I *just* increased the timeout on just 4 days ago[3])
most of these test failures we just need more data. Disabling the tests
now that we finally have useful CI telemetry is like launching a fleet of
satellites to monitor global climate, then banning the agency responsible
for them from monitoring them for vital data. :D

Thanks for reading. Let's move forward on 2.1...carefully.

-Joan

[1] 
https://github.com/apache/couchdb/commit/81ee7c5ac71e617a03e967b4fc5d0358f4ba9459
[2] https://github.com/apache/couchdb/pull/514
[3] 
https://github.com/apache/couchdb/commit/ca4761c6177748f6c87bd072939f7b3eb6fa1edd#diff-41b21ba8ff04bec904f235212d7c4de0

----- Original Message -----
From: "Jan Lehnardt" <j...@apache.org>
To: "dev" <dev@couchdb.apache.org>
Sent: Thursday, 11 May, 2017 1:41:35 PM
Subject: 2.1

Hi all,

we should get CouchDB 2.1 out soon and the test suite situation is a somewhat 
annoying blocker, so I’m proposing something that might sound unusual: disable 
the failing tests.

All test failures are intermittent and we must absolutely address this, but 
since nobody picked this up since February, I think we need a new plan.

The one other issue is that the replication manager was merged recently and is 
still fairly new code, so I’m proposing this:

1. Fork 2.1.x off of master just before the replication scheduler merge.

    1.1. backport any other fixes in master to 2.1.x that happened after the 
replication scheduler.

2. Disable all failing tests.

3. Start the release procedure.

4. Fix tests on master for 2.2, which then also can include the replication 
schedule.

If there are no objections, I’m happy to prepare the 2.1.x branch early next 
week.

Best
Jan
--

Reply via email to