Hello,
  We reached a point where we have too many build slaves and too many users
to get good performance/latency out of our current buildbot waterfall and
console view page.

  We've been tweaking the page a lot lately to make it faster to load, but
the number
of new slaves every day, and the number of new users make it going slower
faster
than we can address the problem. This morning buildbot was completely
unresponsive
because it was not able to keep up with the demand.

  To make it come back to life, I disabled two features, which account for
about 50%
of all traffics : The buildbot chrome extensions, and the top 3 overview
bars at the top
of the waterfall.  This will be able to make it stay online for a while, but
this is not ideal.
It's time to think of a better solution.

  The underlying problem with buildbot is the database format, which is just
hundred of
thousand of files on the harddrive, with no "seek" capability, and the fact
that the
webserver itself is single threaded.

  We currently have 63 slaves on our main waterfall. I think this is too
many for what
buildbot can really support. We would ideally need to split it.

  Q1: Want kind of split would you prefer? mac/linux/windows or
chromium/webkit/modules
or full/windows/mac/linux/memory, etc?

  the main buildbot page would most likely become a bunch of iframe to
display all the
slaves at the same time. The console view integration might be a little bit
less nice. If there is
anyone with web devel experience who wants to help, we could modify the
current waterfall
to fetch only json data from the buildbot, and merge them together, client
side, to get a
combined view.

  Q2: How many changes do we need to display on the console view?

  We are currently displaying the last 50 changes. Which is usually
half-day. If people don't
mind about this, we could scale back to 30. This would make it a little
faster to load.

  Q3: What kind of auto-refresh do we need?

  We used to be at 60 secs for a long time, and I changed it a couple of
weeks ago to 90 secs.
No one complained, so I guess this is good. Should we go even higher than
that?

  Q4: How much build history do we need?

  Right now stdio log are kept for 3 weeks and build results (green, red)
are kept for 1 month. Older
build results are archived but can't be accessed directly by the buildbot.

  If you have any other suggestions, please let me know!

Some things that we can't do:
- Get a better machine. It's already running on a dedicated dual quad core
nehalem server
  with 24gb of RAM and 15k rpm drives.
- Change buildbot to use non-single threaded web server. This is way too
much involved.

*WHAT I NEED YOU HELP WITH :*

1. No more scraping of the waterfall please! If you need to crawl the logs,
let me know and I can
    run your script on the database directly.
2. If you know about apache mod_cache / mod_proxy, and wants to help, please
let me know.
    build.chromium.org is a proxied cache of the real buildbot server, and
the cache does not work
    well. This contribute to another got 25/30% of the overall load on the
buildbot.

Thanks!

Nicolas

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

Reply via email to