a week later, and several big-fixes to dpb engine. I really needed to nail down the way dependencies are handled, especially with respect to locks.
A few errors have been fixed, mostly leading to race conditions. Current situation: - affinity now record mfs use correctly, so that ports shouldn't build twice. - dependencies and affinity are handled more aggressively. Specifically, dependency don't matter during clean, so that junk will catch a bit more stuff. And affinity doesn't kick in until after 'make checksum', as it's stupid to record affinity on stuff that hasn't actually started building. - kde3/4 errors won't stick to a host. As a special exception, the lock gets *cleansed*, so that the host can be untainted, and build will proceed to completion. - there's a bit more smarts in the junk decision. Most recently, the presence of a kde3/kde4 tag will trigger a "first junk" phase on a host, thus making the wiping out of dependencies before restarting an interrupted build unnecessary. Some unrelated issues still remain, and work to be done. - dpb can lose connection to the ssh master and notice, and keep the slave aroun stil... I think this is new. I'm aware of the issue, I'll try to get a handle on it. - I still don't know why dpb spins at some ends of fetch. This is currently worked around, but I'd prefer figuring it out. - hosts in a cluster must be up at start for init to happen. dpb won't init machines that come up too late. Fixing this involves triggering an init job when the host comes up. This is definitely possible, but it requires testing. - I'm trying to get a bit *less* code to load when it isn't necessary. Knowing we're in the 'one host' situation helps. Specifically, memory heuristics and affinity handling are things we don't need to have on a single host, or when no build in memory is involved. The intricate speed-factor queues are not needed if there's no difference in speed-factors on a cluster... - I should interleave host init with reading up build-stats from disks, especially in the presence of somewhat long startup scripts... this probably means making report a bit more dynamic so the user knows what's going on. - haven't had time to finish chroot on localhost, still requires manuall chroot'ing and then starting dpb... - the queue has too many instances of libreoffice. This more or less kills speed-factor situations, as libreoffice takes so many slots it's guaranteed to start on almost any host... rearranging multi-packages as blobs that show up once in the queue would help.... it should also reduce the queue size by half, thus making sorting more efficient, but it's ways more complicated for the engine...