Re: Changing how build automation interacts with the tree

Axel Hecht Sun, 02 Mar 2014 13:50:28 -0800

Hi,

I've watched you guys thinking for an hour ;-)


Some comments from me.

Yes to moving build flows that generate assets into the tree.
Yes to having a way for developers to reproduce what automation does.

Yes to having jobs being executed more on demand than on push, andhaving that have idempotent results.

Sceptical on the vision that we'll see the end of inbounds. Theinteractions between test results and rebase don't seem to be trivialenough to me to hope for non-backout always-open trees via auto land.

I'm having an 'oh noes' for single command called by automation. My mainpoint here is the usefulness of logs generated. When you put allsequential and parallel tasks into one single wrapper process, you endup with one big log file on ftp, like today. And if anything happens,one needs to read that log and reverse engineer which characters in thislog are stdout/stderr, and to which task they belong. I know I can'ttell good from bad in our logs.

OTH, you could have all the structure of the process being exposed inthe automation and its reporting. If something goes wrong, you can tellthe location of the problem in the process right away, you can drilldown to the process task, and its dependencies.

If I think of the problem, I'm thinking along these lines: Let's specifythe process, as a DAG of serialized and parallized tasks, inside thetree, and have the automation run that as is (*). Offer developers aconsole-only hook to that fragment of the complete automation process,akin to integration tests.

* while using buildbot, parallel tasks would need to be executedsequentially. I read the recent posts by Taras et al that buildbot isn'ta solid requirement going forward.

A few comments on mozharness. One of the earliest tasks it offered,IIRC, was multi-locale android builds. Sadly, it happens that it's nothelping those developers that want to create and test multi-localebuilds. It's monolithic deliverable isn't what developers need at thepoint when they test multi-locale builds, nor does it blend in to thedeveloper's setup. Folks like rnewman were glad once I explained how toavoid using mozharness for their builds. To me that's a sign of aninadequate level of abstraction.


And, as it's been mentioned all over the call, l10n repacks:

Testing: repacks are hard to test, and they should be. They're designedto be infallible, so that, no matter what happens in a localization,they're producing runnable builds. A test is challenged to tell betweena broken localization and a broken build system. We shouldn'toverestimate the amount of errors in the build that end up in a buildbustage, and which of those are actually test failures. And which arenot generating build failures, but are bustages. One example would bebroken locale merge dirs, for example. Anything can be in those, and thebuilds build and run fine. They're just not showing the right strings.

More generally, repacks are basically unowned at this point. There's abit of ownership in build, in releng, and me, as to how they're done.There's absolutely nothing as far as reporting goes. The agreementbetween John and me was "if there's anything odd, file a bug on relengto dig in".

That's as much as I can get out of my brain into writing, I wish I hadan hour-long video to go back and forth about stuff ;-)


Axel

On 2/28/14, 9:48 PM, Gregory Szorc wrote:

(This is likely off-topic for many dev-platform readers. I was advised
to post here because RelEng monitors dev-platform and I don't like
cross-posting.)

The technical interaction between build automation and mozilla-central
has organically grown into something that's very difficult to maintain
and improve. There is no formal API between automation and
mozilla-central. As a result, we have automation calling into esoteric
or unsupported commands and make targets. Change is difficult because it
must be coordinated with automation changes. Build system maintainers
lack understanding of what is and isn't used in automation. It's
difficult to reproduce what automation does locally.

The current approach slows everyone down, leads to too-frequent breakage
(l10n repacks are a great example), and limits the efficiency of
automation.

I'm elated to state that at a meeting earlier today, we worked out a
solution to these problems! Full details are in bug 978211.

tl;dr we are going to marshal all interaction between automation and the
tree through a mach-like in-tree script. This script will establish a
clear, supported, and auditable API for automation tasks and will
establish a level of indirection allowing the tree to change without
requiring corresponding changes to automation.

Some of the benefits of this approach include:

* Abstracting the build backend away from automation. The tree will
choose GNU make, pymake, mozmake, Tup, etc depending on what it knows is
best. Currently, automation has {make, pymake, mozmake} hard-coded.

* Allowing the build system to execute more efficiently. Currently,
automation executes compile, symbol generation, and packaging as
separate steps. This change opens the door to moving all these steps
into the core build system's DAG so they can run concurrently, allowing
your build jobs to complete sooner.

* Better support for l10n repacks. They have been a constant headache
for everyone who's touched them. This opens the door to moving more of
the logic into the tree, in a more well-defined API.

* A lot of us want to kill client.mk. Having automation not directly
calling it will allow us to finally do this.

* Clearer identification of problems and responsibilities. Currently,
when something like l10n repacks break, it's not clear if it was due to
a change in the tree or in automation or even what change caused the
regression! Sadly, lots of fingerpointing and "not my problem" tends to
ensue. This change will establish clearer borders and thus lead to
easier and better resolutions.

Please follow bug 978211 for updates.


_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Changing how build automation interacts with the tree

Reply via email to