On 12-08-15 6:17 PM, William Lachance wrote:
On 08/14/2012 03:47 PM, Gregory Szorc wrote:
On 8/14/12 12:14 PM, Ed Morley wrote:
On Thursday, 9 August 2012 15:35:28 UTC+1, Justin Lebar  wrote:
Is there a plan to mitigate the coalescing on m-i?  It seems like that
is a big part of the problem.

Reducing the amount of coalescing permitted would just mean we end up
with a backlog of pending tests on the repo tip - which would result
in tree closures regardless. So other than bug 690672 making sheriffs'
lives easier, we just need more machines in the test pool - since it's
simply a case of demand exceeding capacity.

The situation is made worse now that we're adding new platforms (OS X
10.7, B2G GB, B2G ICS, Android Armv6, soon OS X 10.8, Win8 desktop,
Win8 metro) faster than we're EOLing them - and we're pushing more
changes per day than ever before [1]. From what I understand, Apple's
aggressive hardware cycle is also making it difficult to expand the
test pool [2].

Is there a tracking bug for areas where we could gain efficiency? We all
know the build phase is full of clownshoes. But, I believe we also do
silly things like execute some tests serially, only taking advantage of
1/N CPU cores in the process. This is just wasting resources. See [1]
for a concrete example.

Last year we had a buildfaster project to try and improve our end-to-end
build/test times:

https://wiki.mozilla.org/ReleaseEngineering/BuildFaster

I think it's been recently reactivated, I believe mostly with the
intention of working on build times (which is important, but only one
small part of the overall picture):

http://coop.deadsquid.com/2012/07/reviving-buildfaster-fixing-makefiles/

In general I would be very careful before tackling any particular bug
for the sake of improving our build/test times. If something is slow,
but not on the critical path as far as build/test is concerned, fixing
it will not result in any tangible improvement.

When I was working on this project last year, I designed a build charts
view to help visualize which parts were taking the longest (you can see
implicit dependencies between build/test tasks by seeing when certain
jobs run), which proved very helpful to determine which areas we needed
to optimize:

http://brasstacks.mozilla.com/gofaster/#/buildcharts

I'm not sure if the data feeding into that is still valid (some things
like look suspiciously low, and at the very least it doesn't seem
completely up to date). Anyway, if I were going to look into this again
(don't have time right now unfortunately), I would first spend a lot of
time staring at data. :)

This looks great William. But looking at how our load has been for the past few weeks, I think we're not going to benefit a lot by incremental improvements to end-to-end times.

Honestly, the only big thing that we can probably fix to improve our end-to-end times is to enable using pymake on our Windows builders to do parallel builds. Developers on Windows have been using pymake to get parallel builds for quite a while now, and somebody needs to figure out what's happening on our build machines which causes us not to be able to use pymake there, and fix it. That should significantly decrease our Windows build times depending on the number of cores available on our Windows builders.

Any other low hanging fruits that I can think of are all going to be small incremental improvements which, although being very nice, stand no chance against the rate at which our load is increasing. So unfortunately I don't see any way to address the problem that we're facing in the short term except for adding hardware.

Cheers,
Ehsan
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to