Re: Proactive Jenkins slaves monitoring?

Allen Wittenauer Thu, 12 Oct 2017 10:24:00 -0700

> On Oct 12, 2017, at 8:34 AM, Robert Munteanu <[email protected]> wrote:
> Jenkins slaves running out of disk space has been an issue for quite
> some time. Not a major deal-breaker or very frequent, but it's still
> annoying to chase issues, reconfigure slave labels, retrigger builds,
> etc



        From what I’ve seen, the biggest issues are caused by broken docker 
jobs.  I don’t think people realize that when their docker jobs fail, the disk 
space and container aren’t released. (Docker only automatically cleans up on 
*success*!)  Apache Yetus has tools to deal with old docker bits on the system. 
As a result, on the ‘hadoop’ labeled machines (which have multiple projects 
using Yetus precommit in sentinel mode), I don’t think I’ve seen an out of 
space on those nodes in a very long time.

        Apache Yetus itself is configured to run on quite a few nodes.  When 
the (rare) patch comes through that runs on a node that isn’t typically running 
Yetus, it isn’t unusual to see months worth of images eating space and 
containers still running.  It will then wipe out a bunch of the excess.  I 
should probably add df (and cpu time?) output to see how much it is reclaiming. 
 In some cases I’ve seen, it’s easily in the high GB area.

Re: Proactive Jenkins slaves monitoring?

Reply via email to