Tests may not be doing docker cleanup. Inventory job runs a docker prune
every 12 hours for images older than 24 hrs [1]. Randomly looking at one of
the recent runs [2], it cleaned up a long list of containers consuming
30+GB space. That should be just 12 hours worth of containers.

[1]
https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L69
[2]
https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-14/501/console

On Fri, Jul 24, 2020 at 1:07 PM Tyson Hamilton <tyso...@google.com> wrote:

> Yes, these are on the same volume in the /var/lib/docker directory. I'm
> unsure if they clean up leftover images.
>
> On Fri, Jul 24, 2020 at 12:52 PM Udi Meiri <eh...@google.com> wrote:
>
>> I forgot Docker images:
>>
>> ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df
>> TYPE                TOTAL               ACTIVE              SIZE
>>        RECLAIMABLE
>> Images              88                  9                   125.4GB
>>       124.2GB (99%)
>> Containers          40                  4                   7.927GB
>>       7.871GB (99%)
>> Local Volumes       47                  0                   3.165GB
>>       3.165GB (100%)
>> Build Cache         0                   0                   0B
>>        0B
>>
>> There are about 90 images on that machine, with all but 1 less than 48
>> hours old.
>> I think the docker test jobs need to try harder at cleaning up their
>> leftover images. (assuming they're already doing it?)
>>
>> On Fri, Jul 24, 2020 at 12:31 PM Udi Meiri <eh...@google.com> wrote:
>>
>>> The additional slots (@3 directories) take up even more space now than
>>> before.
>>>
>>> I'm testing out https://github.com/apache/beam/pull/12326 which could
>>> help by cleaning up workspaces after a run (just started a seed job).
>>>
>>> On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton <tyso...@google.com>
>>> wrote:
>>>
>>>> 664M    beam_PreCommit_JavaPortabilityApi_Commit
>>>> 656M    beam_PreCommit_JavaPortabilityApi_Commit@2
>>>> 611M    beam_PreCommit_JavaPortabilityApi_Cron
>>>> 616M    beam_PreCommit_JavaPortabilityApiJava11_Commit
>>>> 598M    beam_PreCommit_JavaPortabilityApiJava11_Commit@2
>>>> 662M    beam_PreCommit_JavaPortabilityApiJava11_Cron
>>>> 2.9G    beam_PreCommit_Portable_Python_Commit
>>>> 2.9G    beam_PreCommit_Portable_Python_Commit@2
>>>> 1.7G    beam_PreCommit_Portable_Python_Commit@3
>>>> 3.4G    beam_PreCommit_Portable_Python_Cron
>>>> 1.9G    beam_PreCommit_Python2_PVR_Flink_Commit
>>>> 1.4G    beam_PreCommit_Python2_PVR_Flink_Cron
>>>> 1.3G    beam_PreCommit_Python2_PVR_Flink_Phrase
>>>> 6.2G    beam_PreCommit_Python_Commit
>>>> 7.5G    beam_PreCommit_Python_Commit@2
>>>> 7.5G    beam_PreCommit_Python_Cron
>>>> 1012M   beam_PreCommit_PythonDocker_Commit
>>>> 1011M   beam_PreCommit_PythonDocker_Commit@2
>>>> 1011M   beam_PreCommit_PythonDocker_Commit@3
>>>> 1002M   beam_PreCommit_PythonDocker_Cron
>>>> 877M    beam_PreCommit_PythonFormatter_Commit
>>>> 988M    beam_PreCommit_PythonFormatter_Cron
>>>> 986M    beam_PreCommit_PythonFormatter_Phrase
>>>> 1.7G    beam_PreCommit_PythonLint_Commit
>>>> 2.1G    beam_PreCommit_PythonLint_Cron
>>>> 7.5G    beam_PreCommit_Python_Phrase
>>>> 346M    beam_PreCommit_RAT_Commit
>>>> 341M    beam_PreCommit_RAT_Cron
>>>> 338M    beam_PreCommit_Spotless_Commit
>>>> 339M    beam_PreCommit_Spotless_Cron
>>>> 5.5G    beam_PreCommit_SQL_Commit
>>>> 5.5G    beam_PreCommit_SQL_Cron
>>>> 5.5G    beam_PreCommit_SQL_Java11_Commit
>>>> 750M    beam_PreCommit_Website_Commit
>>>> 750M    beam_PreCommit_Website_Commit@2
>>>> 750M    beam_PreCommit_Website_Cron
>>>> 764M    beam_PreCommit_Website_Stage_GCS_Commit
>>>> 771M    beam_PreCommit_Website_Stage_GCS_Cron
>>>> 336M    beam_Prober_CommunityMetrics
>>>> 693M    beam_python_mongoio_load_test
>>>> 339M    beam_SeedJob
>>>> 333M    beam_SeedJob_Standalone
>>>> 334M    beam_sonarqube_report
>>>> 556M    beam_SQLBigQueryIO_Batch_Performance_Test_Java
>>>> 175G    total
>>>>
>>>> On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton <tyso...@google.com>
>>>> wrote:
>>>>
>>>>> Ya looks like something in the workspaces is taking up room:
>>>>>
>>>>> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc .
>>>>> 191G    .
>>>>> 191G    total
>>>>>
>>>>>
>>>>> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton <tyso...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Node 8 is also full. The partition that /tmp is on is here:
>>>>>>
>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>> /dev/sda1       485G  482G  2.9G 100% /
>>>>>>
>>>>>> however after cleaning up tmp with the crontab command, there is only
>>>>>> 8G usage yet it still remains 100% full:
>>>>>>
>>>>>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp
>>>>>> 8.0G    /tmp
>>>>>> 8.0G    total
>>>>>>
>>>>>> The workspaces are in the /home/jenkins/jenkins-slave/workspace
>>>>>> directory. When I run a du on that, it takes really long. I'll let it 
>>>>>> keep
>>>>>> running for a while to see if it ever returns a result but so far this
>>>>>> seems suspect.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton <tyso...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Everything I've been looking at is in the /tmp dir. Where are the
>>>>>>> workspaces, or what are the named?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com> wrote:
>>>>>>>
>>>>>>>> I'm curious to what you find. Was it /tmp or the workspaces using
>>>>>>>> up the space?
>>>>>>>>
>>>>>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <tyso...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Bleck. I just realized that it is 'offline' so that won't work.
>>>>>>>>> I'll clean up manually on the machine using the cron command.
>>>>>>>>>
>>>>>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <
>>>>>>>>> tyso...@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> Something isn't working with the current set up because node 15
>>>>>>>>>> appears to be out of space and is currently 'offline' according to 
>>>>>>>>>> Jenkins.
>>>>>>>>>> Can someone run the cleanup job? The machine is full,
>>>>>>>>>>
>>>>>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
>>>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>>>>>> udev             52G     0   52G   0% /dev
>>>>>>>>>> tmpfs            11G  265M   10G   3% /run
>>>>>>>>>> */dev/sda1       485G  484G  880M 100% /*
>>>>>>>>>> tmpfs            52G     0   52G   0% /dev/shm
>>>>>>>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
>>>>>>>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
>>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1017
>>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1037
>>>>>>>>>>
>>>>>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort -rhk
>>>>>>>>>> 1,1 | head -n 20
>>>>>>>>>> 20G     2020-07-24 17:52        .
>>>>>>>>>> 580M    2020-07-22 17:31        ./junit1031982597110125586
>>>>>>>>>> 517M    2020-07-22 17:31
>>>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
>>>>>>>>>> 517M    2020-07-22 17:31
>>>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410
>>>>>>>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
>>>>>>>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
>>>>>>>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
>>>>>>>>>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
>>>>>>>>>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
>>>>>>>>>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
>>>>>>>>>> 236M    2020-07-21 20:25
>>>>>>>>>>  ./beam-pipeline-tempmByU6T/tmpOWj3Yr
>>>>>>>>>> 236M    2020-07-21 20:21
>>>>>>>>>>  ./beam-pipeline-tempV85xeK/tmppbQHB3
>>>>>>>>>> 236M    2020-07-21 20:15
>>>>>>>>>>  ./beam-pipeline-temp7dJROJ/tmpgOXPKW
>>>>>>>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
>>>>>>>>>> 105M    2020-07-23 00:17        ./beam-artifact1374651823280819755
>>>>>>>>>> 105M    2020-07-23 00:16        ./beam-artifact5050755582921936972
>>>>>>>>>> 105M    2020-07-23 00:16        ./beam-artifact1834064452502646289
>>>>>>>>>> 105M    2020-07-23 00:15        ./beam-artifact682561790267074916
>>>>>>>>>> 105M    2020-07-23 00:15        ./beam-artifact4691304965824489394
>>>>>>>>>> 105M    2020-07-23 00:14        ./beam-artifact4050383819822604421
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <
>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <
>>>>>>>>>>> tyso...@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Ah I see, thanks Kenn. I found some advice from the Apache
>>>>>>>>>>>> infra wiki that also suggests using a tmpdir inside the workspace 
>>>>>>>>>>>> [1]:
>>>>>>>>>>>>
>>>>>>>>>>>> Procedures Projects can take to clean up disk space
>>>>>>>>>>>>
>>>>>>>>>>>> Projects can help themselves and Infra by taking some basic
>>>>>>>>>>>> steps to help clean up their jobs after themselves on the build 
>>>>>>>>>>>> nodes.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    1. Use a ./tmp dir in your jobs workspace. That way it gets
>>>>>>>>>>>>    cleaned up when job workspaces expire.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Tests should be (able to be) written to use the standard
>>>>>>>>>>> temporary file mechanisms, and the environment set up on Jenkins 
>>>>>>>>>>> such that
>>>>>>>>>>> that falls into the respective workspaces. Ideally this should be 
>>>>>>>>>>> as simple
>>>>>>>>>>> as setting the TMPDIR (or similar) environment variable (and making 
>>>>>>>>>>> sure it
>>>>>>>>>>> exists/is writable).
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    1. Configure your jobs to wipe workspaces on start or
>>>>>>>>>>>>    finish.
>>>>>>>>>>>>    2. Configure your jobs to only keep 5 or 10 previous builds.
>>>>>>>>>>>>    3. Configure your jobs to only keep 5 or 10 previous
>>>>>>>>>>>>    artifacts.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [1]:
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <
>>>>>>>>>>>> k...@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Those file listings look like the result of using standard
>>>>>>>>>>>>> temp file APIs but with TMPDIR set to /tmp.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <
>>>>>>>>>>>>> tyso...@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique
>>>>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look into two 
>>>>>>>>>>>>>> examples:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort
>>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>>>>>>>>>>>>>> 1.6G    2020-07-21 02:25        .
>>>>>>>>>>>>>> 242M    2020-07-17 18:48        ./beam-pipeline-temp3ybuY4
>>>>>>>>>>>>>> 242M    2020-07-17 18:46        ./beam-pipeline-tempuxjiPT
>>>>>>>>>>>>>> 242M    2020-07-17 18:44        ./beam-pipeline-tempVpg1ME
>>>>>>>>>>>>>> 242M    2020-07-17 18:42        ./beam-pipeline-tempJ4EpyB
>>>>>>>>>>>>>> 242M    2020-07-17 18:39        ./beam-pipeline-tempepea7Q
>>>>>>>>>>>>>> 242M    2020-07-17 18:35        ./beam-pipeline-temp79qot2
>>>>>>>>>>>>>> 236M    2020-07-17 18:48
>>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>>>>>>>>>>>>> 236M    2020-07-17 18:46
>>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>>>>>>>>>>>>> 236M    2020-07-17 18:44
>>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>>>>>>>>>>>>> 236M    2020-07-17 18:42
>>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>>>>>>>>>>>>> 236M    2020-07-17 18:39
>>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmpWy1vWX
>>>>>>>>>>>>>> 236M    2020-07-17 18:35
>>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmpvN7vWA
>>>>>>>>>>>>>> 3.7M    2020-07-17 18:48
>>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmprlh_di
>>>>>>>>>>>>>> 3.7M    2020-07-17 18:46
>>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>>>>>>>>>>>>> 3.7M    2020-07-17 18:44
>>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>>>>>>>>>>>>> 3.7M    2020-07-17 18:42
>>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>>>>>>>>>>>>> 3.7M    2020-07-17 18:39
>>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmptYF1v1
>>>>>>>>>>>>>> 3.7M    2020-07-17 18:35
>>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmplfV0Rg
>>>>>>>>>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort
>>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>>>>>>>>>>>>>> 817M    2020-07-21 02:26        .
>>>>>>>>>>>>>> 242M    2020-07-19 12:14        ./beam-pipeline-tempUTXqlM
>>>>>>>>>>>>>> 242M    2020-07-19 12:11        ./beam-pipeline-tempx3Yno3
>>>>>>>>>>>>>> 242M    2020-07-19 12:05        ./beam-pipeline-tempyCrMYq
>>>>>>>>>>>>>> 236M    2020-07-19 12:14
>>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpstXoL0
>>>>>>>>>>>>>> 236M    2020-07-19 12:11
>>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpnnVn65
>>>>>>>>>>>>>> 236M    2020-07-19 12:05
>>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>>>>>>>>>>>>> 3.7M    2020-07-19 12:14
>>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>>>>>>>>>>>>> 3.7M    2020-07-19 12:11
>>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>>>>>>>>>>>>> 3.7M    2020-07-19 12:05
>>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>>>>>>>>>>>>> 2.0M    2020-07-19 12:14
>>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpoj3orz
>>>>>>>>>>>>>> 2.0M    2020-07-19 12:11
>>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmptng9sZ
>>>>>>>>>>>>>> 2.0M    2020-07-19 12:05
>>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpWp6njc
>>>>>>>>>>>>>> 1.2M    2020-07-19 12:14
>>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmphgdj35
>>>>>>>>>>>>>> 1.2M    2020-07-19 12:11
>>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>>>>>>>>>>>>> 1.2M    2020-07-19 12:05
>>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>>>>>>>>>>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
>>>>>>>>>>>>>> 988K    2020-07-12 12:00        ./junit642086915811430564/beam
>>>>>>>>>>>>>> 984K    2020-07-12 12:00
>>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes
>>>>>>>>>>>>>> 980K    2020-07-12 12:00
>>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes/0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You're right, job workspaces should be hermetic.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <
>>>>>>>>>>>>>>> k...@apache.org> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm probably late to this discussion and missing something,
>>>>>>>>>>>>>>>> but why are we writing to /tmp at all? I would expect TMPDIR 
>>>>>>>>>>>>>>>> to point
>>>>>>>>>>>>>>>> somewhere inside the job directory that will be wiped by 
>>>>>>>>>>>>>>>> Jenkins, and I
>>>>>>>>>>>>>>>> would expect code to always create temp files via APIs that 
>>>>>>>>>>>>>>>> respect this.
>>>>>>>>>>>>>>>> Is Jenkins not cleaning up? Do we not have the ability to set 
>>>>>>>>>>>>>>>> this up? Do
>>>>>>>>>>>>>>>> we have bugs in our code (that we could probably find by 
>>>>>>>>>>>>>>>> setting TMPDIR to
>>>>>>>>>>>>>>>> somewhere not-/tmp and running the tests without write 
>>>>>>>>>>>>>>>> permission to /tmp,
>>>>>>>>>>>>>>>> etc)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Kenn
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <
>>>>>>>>>>>>>>>> al...@google.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
>>>>>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue previously (
>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for
>>>>>>>>>>>>>>>>> cleaning up workspace directory after successful jobs. 
>>>>>>>>>>>>>>>>> Alternatively, we
>>>>>>>>>>>>>>>>> can consider periodically cleaning up the /src directories.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I would suggest moving the cron task from internal cron
>>>>>>>>>>>>>>>>> scripts to the inventory job (
>>>>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>>>>>>>>>>>>>>>>> That way, we can see all the cron jobs as part of the source 
>>>>>>>>>>>>>>>>> tree, adjust
>>>>>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not know how 
>>>>>>>>>>>>>>>>> internal cron
>>>>>>>>>>>>>>>>> scripts are created, maintained, and how would they be 
>>>>>>>>>>>>>>>>> recreated for new
>>>>>>>>>>>>>>>>> worker instances.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> /cc +Tyson Hamilton <tyso...@google.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>>>>>>>>>>>>>>>>> damian.gadom...@polidea.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've recently created a solution for the growing /tmp
>>>>>>>>>>>>>>>>>> directory. Part of it is the job mentioned by Tyson:
>>>>>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not
>>>>>>>>>>>>>>>>>> triggered by cron and should be a last resort solution for 
>>>>>>>>>>>>>>>>>> some strange
>>>>>>>>>>>>>>>>>> cases.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Along with that job, I've also updated every worker with
>>>>>>>>>>>>>>>>>> an internal cron script. It's being executed once a week and 
>>>>>>>>>>>>>>>>>> deletes all
>>>>>>>>>>>>>>>>>> the files (and only files) that were not accessed for at 
>>>>>>>>>>>>>>>>>> least three days.
>>>>>>>>>>>>>>>>>> That's designed to be as safe as possible for the running 
>>>>>>>>>>>>>>>>>> jobs on the
>>>>>>>>>>>>>>>>>> worker (not to delete the files that are still in use), and 
>>>>>>>>>>>>>>>>>> also to be
>>>>>>>>>>>>>>>>>> insensitive to the current workload on the machine. The 
>>>>>>>>>>>>>>>>>> cleanup will always
>>>>>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are blocking 
>>>>>>>>>>>>>>>>>> the machine.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I also think that currently the "No space left" errors
>>>>>>>>>>>>>>>>>> may be a consequence of growing workspace directory rather 
>>>>>>>>>>>>>>>>>> than /tmp. I
>>>>>>>>>>>>>>>>>> didn't do any detailed analysis but e.g. currently, on
>>>>>>>>>>>>>>>>>> apache-beam-jenkins-7 the workspace directory size is 158 GB 
>>>>>>>>>>>>>>>>>> while /tmp is
>>>>>>>>>>>>>>>>>> only 16 GB. We should either guarantee the disk size to hold 
>>>>>>>>>>>>>>>>>> workspaces for
>>>>>>>>>>>>>>>>>> all jobs (because eventually, every worker will execute each 
>>>>>>>>>>>>>>>>>> job) or clear
>>>>>>>>>>>>>>>>>> also the workspaces in some way.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>> Damian
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <
>>>>>>>>>>>>>>>>>> m...@apache.org> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't lead to
>>>>>>>>>>>>>>>>>>> test failures
>>>>>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe there is
>>>>>>>>>>>>>>>>>>> the notion of
>>>>>>>>>>>>>>>>>>> running exclusively while no other tasks are running?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -Max
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>>>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in Jenkins:
>>>>>>>>>>>>>>>>>>> beam_Clean_tmp_directory
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm seeing some
>>>>>>>>>>>>>>>>>>> out of disk related errors in precommit tests currently, 
>>>>>>>>>>>>>>>>>>> perhaps we should
>>>>>>>>>>>>>>>>>>> schedule this job with cron?
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <
>>>>>>>>>>>>>>>>>>> heej...@google.com> wrote:
>>>>>>>>>>>>>>>>>>> >> Still seeing no space left on device errors on
>>>>>>>>>>>>>>>>>>> jenkins-7 (for example:
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>>>>>>>>>>>>>>>>>>> amyrv...@google.com> wrote:
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins
>>>>>>>>>>>>>>>>>>> older than 3 days.
>>>>>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> >>> Passing recent tests on all executors except
>>>>>>>>>>>>>>>>>>> jenkins-12, which has not
>>>>>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not
>>>>>>>>>>>>>>>>>>> scheduling:
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>> Recent passing builds:
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>>>>>>>>>>>>>>>>> >>> <
>>>>>>>>>>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
>>>>>>>>>>>>>>>>>>> al...@google.com> wrote:
>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>> >>>> +Alan Myrvold <amyrv...@google.com> is doing a one
>>>>>>>>>>>>>>>>>>> time cleanup. I agree
>>>>>>>>>>>>>>>>>>> >>>> that we need to have a solution to automate this
>>>>>>>>>>>>>>>>>>> task or address the root
>>>>>>>>>>>>>>>>>>> >>>> cause of the buildup.
>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>>>>>>>>>>>>>>>>> michal.wale...@polidea.com>
>>>>>>>>>>>>>>>>>>> >>>> wrote:
>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>> >>>>> Hi there,
>>>>>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers
>>>>>>>>>>>>>>>>>>> again. Nodes 1 and 7
>>>>>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on device".
>>>>>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in these cases
>>>>>>>>>>>>>>>>>>> (someone with access
>>>>>>>>>>>>>>>>>>> >>>>> permissions to the workers).
>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming more
>>>>>>>>>>>>>>>>>>> and more frequent
>>>>>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can this be
>>>>>>>>>>>>>>>>>>> remedied. Can a cleanup
>>>>>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>> >>>>> Regards
>>>>>>>>>>>>>>>>>>> >>>>> Michal
>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>> >>>>> --
>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>> >>>>> Michał Walenia
>>>>>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software
>>>>>>>>>>>>>>>>>>> Engineer
>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <
>>>>>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002>>
>>>>>>>>>>>>>>>>>>> >>>>> E: michal.wale...@polidea.com
>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>> >>>>> Unique Tech
>>>>>>>>>>>>>>>>>>> >>>>> Check out our projects! <
>>>>>>>>>>>>>>>>>>> https://www.polidea.com/our-work>
>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>

Reply via email to