As Allen mentions Docker is the big offender.  I've added one cleanup
`docker system prune -a -f` to run hourly.  The problem nodes are the
qnodes which have much less space for docker than the other nodes.
I'm disabling them for the time being until I can either get a bigger
disk or guarantee they don't run out of space weekly.

On Thu, Oct 12, 2017 at 11:23 AM, Allen Wittenauer
<[email protected]> wrote:
>
>> On Oct 12, 2017, at 8:34 AM, Robert Munteanu <[email protected]> wrote:
>> Jenkins slaves running out of disk space has been an issue for quite
>> some time. Not a major deal-breaker or very frequent, but it's still
>> annoying to chase issues, reconfigure slave labels, retrigger builds,
>> etc
>
>
>         From what I’ve seen, the biggest issues are caused by broken docker 
> jobs.  I don’t think people realize that when their docker jobs fail, the 
> disk space and container aren’t released. (Docker only automatically cleans 
> up on *success*!)  Apache Yetus has tools to deal with old docker bits on the 
> system. As a result, on the ‘hadoop’ labeled machines (which have multiple 
> projects using Yetus precommit in sentinel mode), I don’t think I’ve seen an 
> out of space on those nodes in a very long time.
>
>         Apache Yetus itself is configured to run on quite a few nodes.  When 
> the (rare) patch comes through that runs on a node that isn’t typically 
> running Yetus, it isn’t unusual to see months worth of images eating space 
> and containers still running.  It will then wipe out a bunch of the excess.  
> I should probably add df (and cpu time?) output to see how much it is 
> reclaiming.  In some cases I’ve seen, it’s easily in the high GB area.
>

Reply via email to