Marco and I ran into another urgent issue over the weekend that was causing builds to fail. This issue was unrelated to any feature development work, or other CI fixes applied recently, but it did require quite a bit of work from Marco (and a little from me) to fix.
We spent enough time on the problem that it caused us to take a step back and consider how we could both fix issues in CI and support the 1.4 release with the least impact possible on MXNet devs. Marco had planned to make a significant change to the CI to fix a long-standing Jenkins error [1], but we feel that most developers would prioritize having a stable build environment for the next few weeks over having this fix in place. To properly introduce a new CI system the intent was to do a gradual blue/green roll out of the fix. To manage this rollout would have taken operational effort and double compute load as we run systems in parallel. This risks outages due to scaling limits, and we’d rather make this change during a period of low-developer activity, i.e. shortly after the 1.4 release. This means that from now until the 1.4 release, in order to reduce complexity MXNet developers should only see a single Jenkins verification check, and a single Travis check.