Marco and I ran into another urgent issue over the weekend that was causing
builds to fail.  This issue was unrelated to any feature development work,
or other CI fixes applied recently, but it did require quite a bit of work
from Marco (and a little from me) to fix.

We spent enough time on the problem that it caused us to take a step back
and consider how we could both fix issues in CI and support the 1.4 release
with the least impact possible on MXNet devs.  Marco had planned to make a
significant change to the CI to fix a long-standing Jenkins error [1], but
we feel that most developers would prioritize having a stable build
environment for the next few weeks over having this fix in place.

To properly introduce a new CI system the intent was to do a gradual
blue/green roll out of the fix.  To manage this rollout would have taken
operational effort and double compute load as we run systems in parallel.
This risks outages due to scaling limits, and we’d rather make this change
during a period of low-developer activity, i.e. shortly after the 1.4
release.

This means that from now until the 1.4 release, in order to reduce
complexity MXNet developers should only see a single Jenkins verification
check, and a single Travis check.

Reply via email to