a week later, and several big-fixes to dpb engine.

I really needed to nail down the way dependencies are
handled, especially with respect to locks.

A few errors have been fixed, mostly leading to race
conditions.

Current situation:
- affinity now record mfs use correctly, so that ports
shouldn't build twice.
- dependencies and affinity are handled more aggressively.
Specifically, dependency don't matter during clean, so
that junk will catch a bit more stuff. And affinity doesn't
kick in until after 'make checksum', as it's stupid to record
affinity on stuff that hasn't actually started building.
- kde3/4 errors won't stick to a host. As a special exception,
the lock gets *cleansed*, so that the host can be untainted,
and build will proceed to completion.
- there's a bit more smarts in the junk decision. Most recently,
the presence of a kde3/kde4 tag will trigger a "first junk" phase
on a host, thus making the wiping out of dependencies before
restarting an interrupted build unnecessary.

Some unrelated issues still remain, and work to be done.
- dpb can lose connection to the ssh master and notice, and
keep the slave aroun stil... I think this is new. I'm aware
of the issue, I'll try to get a handle on it.
- I still don't know why dpb spins at some ends of fetch.
This is currently worked around, but I'd prefer figuring it
out.
- hosts in a cluster must be up at start for init to happen.
dpb won't init machines that come up too late.  Fixing this
involves triggering an init job when the host comes up. This
is definitely possible, but it requires testing.
- I'm trying to get a bit *less* code to load when it isn't
necessary. Knowing we're in the 'one host' situation helps.
Specifically, memory heuristics and affinity handling are
things we don't need to have on a single host, or when no
build in memory is involved.  The intricate speed-factor
queues are not needed if there's no difference in speed-factors
on a cluster...
- I should interleave host init with reading up build-stats
from disks, especially in the presence of somewhat long startup
scripts...  this probably means making report a bit more dynamic
so the user knows what's going on.
- haven't had time to finish chroot on localhost, still requires
manuall chroot'ing and then starting dpb...
- the queue has too many instances of libreoffice. This more or
less kills speed-factor situations, as libreoffice takes so many
slots it's guaranteed to start on almost any host... rearranging
multi-packages as blobs that show up once in the queue would
help.... it should also reduce the queue size by half, thus making
sorting more efficient, but it's ways more complicated for the
engine...

Reply via email to