> On Oct 12, 2017, at 8:34 AM, Robert Munteanu <[email protected]> wrote:
> Jenkins slaves running out of disk space has been an issue for quite
> some time. Not a major deal-breaker or very frequent, but it's still
> annoying to chase issues, reconfigure slave labels, retrigger builds,
> etc
From what I’ve seen, the biggest issues are caused by broken docker
jobs. I don’t think people realize that when their docker jobs fail, the disk
space and container aren’t released. (Docker only automatically cleans up on
*success*!) Apache Yetus has tools to deal with old docker bits on the system.
As a result, on the ‘hadoop’ labeled machines (which have multiple projects
using Yetus precommit in sentinel mode), I don’t think I’ve seen an out of
space on those nodes in a very long time.
Apache Yetus itself is configured to run on quite a few nodes. When
the (rare) patch comes through that runs on a node that isn’t typically running
Yetus, it isn’t unusual to see months worth of images eating space and
containers still running. It will then wipe out a bunch of the excess. I
should probably add df (and cpu time?) output to see how much it is reclaiming.
In some cases I’ve seen, it’s easily in the high GB area.