Agree with Udi, workspaces seem to be the third culprit, not yet addressed
in any way (until PR#12326 <https://github.com/apache/beam/pull/12326> is
merged). I feel that it'll solve the issue of filling up the disks for a
long time ;)

I'm also OK with moving /tmp cleanup to option B, and will happily
investigate on proper TMPDIR config.



On Tue, Jul 28, 2020 at 3:07 AM Udi Meiri <eh...@google.com> wrote:

> What about the workspaces, which can take up 175GB in some cases (see
> above)?
> I'm working on getting them cleaned up automatically:
> https://github.com/apache/beam/pull/12326
>
> My opinion is that we would get more mileage out of fixing the jobs that
> leave behind files in /tmp and images/containers in Docker.
> This would also help keep development machines clean.
>
>
> On Mon, Jul 27, 2020 at 5:31 PM Tyson Hamilton <tyso...@google.com> wrote:
>
>> Here is a summery of how I understand things,
>>
>>   - /tmp and /var/lib/docker are the culprit for filling up disks
>>   - inventory Jenkins job runs every 12 hours and runs a docker prune to
>> clean up images older than 24hr
>>   - crontab on each machine cleans up /tmp files older than three days
>> weekly
>>
>> This doesn't seem to be working since we're still running out of disk
>> periodically and requiring manual intervention. Knobs and options we have
>> available:
>>
>>   1. increase frequency of deleting files
>>   2. decrease the number of days required to delete a file (e.g. older
>> than 2 days)
>>
>> The execution methods we have available are:
>>
>>   A. cron
>>     - pro: runs even if a job gets stuck in Jenkins due to full disk
>>     - con: config baked into VM which is tough to update, not
>> discoverable or documented well
>>   B. inventory job
>>     - pro: easy to update, runs every 12h already
>>     - con: could get stuck if Jenkins agent runs out of disk or is
>> otherwise stuck, tied to all other inventory job frequency
>>   C. configure startup scripts for the VMs that set up the cron job
>> anytime the VM is restarted
>>     - pro: similar to A. and easy to update
>>     - con: similar to A.
>>
>> Between the three I prefer B. because it is consistent with other
>> inventory jobs. If it ends up that stuck jobs prohibit scheduling of the
>> inventory job often we could further investigate C to avoid having to
>> rebuild the VM images repeatedly.
>>
>> Any objections or comments? If not, we'll go forward with B. and reduce
>> the date check from 3 days to 2 days.
>>
>>
>> On 2020/07/24 20:13:29, Ahmet Altay <al...@google.com> wrote:
>> > Tests may not be doing docker cleanup. Inventory job runs a docker prune
>> > every 12 hours for images older than 24 hrs [1]. Randomly looking at
>> one of
>> > the recent runs [2], it cleaned up a long list of containers consuming
>> > 30+GB space. That should be just 12 hours worth of containers.
>> >
>> > [1]
>> >
>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L69
>> > [2]
>> >
>> https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-14/501/console
>> >
>> > On Fri, Jul 24, 2020 at 1:07 PM Tyson Hamilton <tyso...@google.com>
>> wrote:
>> >
>> > > Yes, these are on the same volume in the /var/lib/docker directory.
>> I'm
>> > > unsure if they clean up leftover images.
>> > >
>> > > On Fri, Jul 24, 2020 at 12:52 PM Udi Meiri <eh...@google.com> wrote:
>> > >
>> > >> I forgot Docker images:
>> > >>
>> > >> ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df
>> > >> TYPE                TOTAL               ACTIVE              SIZE
>> > >>        RECLAIMABLE
>> > >> Images              88                  9                   125.4GB
>> > >>       124.2GB (99%)
>> > >> Containers          40                  4                   7.927GB
>> > >>       7.871GB (99%)
>> > >> Local Volumes       47                  0                   3.165GB
>> > >>       3.165GB (100%)
>> > >> Build Cache         0                   0                   0B
>> > >>        0B
>> > >>
>> > >> There are about 90 images on that machine, with all but 1 less than
>> 48
>> > >> hours old.
>> > >> I think the docker test jobs need to try harder at cleaning up their
>> > >> leftover images. (assuming they're already doing it?)
>> > >>
>> > >> On Fri, Jul 24, 2020 at 12:31 PM Udi Meiri <eh...@google.com> wrote:
>> > >>
>> > >>> The additional slots (@3 directories) take up even more space now
>> than
>> > >>> before.
>> > >>>
>> > >>> I'm testing out https://github.com/apache/beam/pull/12326 which
>> could
>> > >>> help by cleaning up workspaces after a run (just started a seed
>> job).
>> > >>>
>> > >>> On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton <tyso...@google.com
>> >
>> > >>> wrote:
>> > >>>
>> > >>>> 664M    beam_PreCommit_JavaPortabilityApi_Commit
>> > >>>> 656M    beam_PreCommit_JavaPortabilityApi_Commit@2
>> > >>>> 611M    beam_PreCommit_JavaPortabilityApi_Cron
>> > >>>> 616M    beam_PreCommit_JavaPortabilityApiJava11_Commit
>> > >>>> 598M    beam_PreCommit_JavaPortabilityApiJava11_Commit@2
>> > >>>> 662M    beam_PreCommit_JavaPortabilityApiJava11_Cron
>> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit
>> > >>>> 2.9G    beam_PreCommit_Portable_Python_Commit@2
>> > >>>> 1.7G    beam_PreCommit_Portable_Python_Commit@3
>> > >>>> 3.4G    beam_PreCommit_Portable_Python_Cron
>> > >>>> 1.9G    beam_PreCommit_Python2_PVR_Flink_Commit
>> > >>>> 1.4G    beam_PreCommit_Python2_PVR_Flink_Cron
>> > >>>> 1.3G    beam_PreCommit_Python2_PVR_Flink_Phrase
>> > >>>> 6.2G    beam_PreCommit_Python_Commit
>> > >>>> 7.5G    beam_PreCommit_Python_Commit@2
>> > >>>> 7.5G    beam_PreCommit_Python_Cron
>> > >>>> 1012M   beam_PreCommit_PythonDocker_Commit
>> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@2
>> > >>>> 1011M   beam_PreCommit_PythonDocker_Commit@3
>> > >>>> 1002M   beam_PreCommit_PythonDocker_Cron
>> > >>>> 877M    beam_PreCommit_PythonFormatter_Commit
>> > >>>> 988M    beam_PreCommit_PythonFormatter_Cron
>> > >>>> 986M    beam_PreCommit_PythonFormatter_Phrase
>> > >>>> 1.7G    beam_PreCommit_PythonLint_Commit
>> > >>>> 2.1G    beam_PreCommit_PythonLint_Cron
>> > >>>> 7.5G    beam_PreCommit_Python_Phrase
>> > >>>> 346M    beam_PreCommit_RAT_Commit
>> > >>>> 341M    beam_PreCommit_RAT_Cron
>> > >>>> 338M    beam_PreCommit_Spotless_Commit
>> > >>>> 339M    beam_PreCommit_Spotless_Cron
>> > >>>> 5.5G    beam_PreCommit_SQL_Commit
>> > >>>> 5.5G    beam_PreCommit_SQL_Cron
>> > >>>> 5.5G    beam_PreCommit_SQL_Java11_Commit
>> > >>>> 750M    beam_PreCommit_Website_Commit
>> > >>>> 750M    beam_PreCommit_Website_Commit@2
>> > >>>> 750M    beam_PreCommit_Website_Cron
>> > >>>> 764M    beam_PreCommit_Website_Stage_GCS_Commit
>> > >>>> 771M    beam_PreCommit_Website_Stage_GCS_Cron
>> > >>>> 336M    beam_Prober_CommunityMetrics
>> > >>>> 693M    beam_python_mongoio_load_test
>> > >>>> 339M    beam_SeedJob
>> > >>>> 333M    beam_SeedJob_Standalone
>> > >>>> 334M    beam_sonarqube_report
>> > >>>> 556M    beam_SQLBigQueryIO_Batch_Performance_Test_Java
>> > >>>> 175G    total
>> > >>>>
>> > >>>> On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton <
>> tyso...@google.com>
>> > >>>> wrote:
>> > >>>>
>> > >>>>> Ya looks like something in the workspaces is taking up room:
>> > >>>>>
>> > >>>>> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc .
>> > >>>>> 191G    .
>> > >>>>> 191G    total
>> > >>>>>
>> > >>>>>
>> > >>>>> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton <
>> tyso...@google.com>
>> > >>>>> wrote:
>> > >>>>>
>> > >>>>>> Node 8 is also full. The partition that /tmp is on is here:
>> > >>>>>>
>> > >>>>>> Filesystem      Size  Used Avail Use% Mounted on
>> > >>>>>> /dev/sda1       485G  482G  2.9G 100% /
>> > >>>>>>
>> > >>>>>> however after cleaning up tmp with the crontab command, there is
>> only
>> > >>>>>> 8G usage yet it still remains 100% full:
>> > >>>>>>
>> > >>>>>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp
>> > >>>>>> 8.0G    /tmp
>> > >>>>>> 8.0G    total
>> > >>>>>>
>> > >>>>>> The workspaces are in the /home/jenkins/jenkins-slave/workspace
>> > >>>>>> directory. When I run a du on that, it takes really long. I'll
>> let it keep
>> > >>>>>> running for a while to see if it ever returns a result but so
>> far this
>> > >>>>>> seems suspect.
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton <
>> tyso...@google.com>
>> > >>>>>> wrote:
>> > >>>>>>
>> > >>>>>>> Everything I've been looking at is in the /tmp dir. Where are
>> the
>> > >>>>>>> workspaces, or what are the named?
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com>
>> wrote:
>> > >>>>>>>
>> > >>>>>>>> I'm curious to what you find. Was it /tmp or the workspaces
>> using
>> > >>>>>>>> up the space?
>> > >>>>>>>>
>> > >>>>>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <
>> tyso...@google.com>
>> > >>>>>>>> wrote:
>> > >>>>>>>>
>> > >>>>>>>>> Bleck. I just realized that it is 'offline' so that won't
>> work.
>> > >>>>>>>>> I'll clean up manually on the machine using the cron command.
>> > >>>>>>>>>
>> > >>>>>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <
>> > >>>>>>>>> tyso...@google.com> wrote:
>> > >>>>>>>>>
>> > >>>>>>>>>> Something isn't working with the current set up because node
>> 15
>> > >>>>>>>>>> appears to be out of space and is currently 'offline'
>> according to Jenkins.
>> > >>>>>>>>>> Can someone run the cleanup job? The machine is full,
>> > >>>>>>>>>>
>> > >>>>>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h
>> > >>>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>> > >>>>>>>>>> udev             52G     0   52G   0% /dev
>> > >>>>>>>>>> tmpfs            11G  265M   10G   3% /run
>> > >>>>>>>>>> */dev/sda1       485G  484G  880M 100% /*
>> > >>>>>>>>>> tmpfs            52G     0   52G   0% /dev/shm
>> > >>>>>>>>>> tmpfs           5.0M     0  5.0M   0% /run/lock
>> > >>>>>>>>>> tmpfs            52G     0   52G   0% /sys/fs/cgroup
>> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1017
>> > >>>>>>>>>> tmpfs            11G     0   11G   0% /run/user/1037
>> > >>>>>>>>>>
>> > >>>>>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort
>> -rhk
>> > >>>>>>>>>> 1,1 | head -n 20
>> > >>>>>>>>>> 20G     2020-07-24 17:52        .
>> > >>>>>>>>>> 580M    2020-07-22 17:31        ./junit1031982597110125586
>> > >>>>>>>>>> 517M    2020-07-22 17:31
>> > >>>>>>>>>>
>> ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof
>> > >>>>>>>>>> 517M    2020-07-22 17:31
>> > >>>>>>>>>>  ./junit1031982597110125586/junit8739924829337821410
>> > >>>>>>>>>> 263M    2020-07-22 12:23        ./pip-install-2GUhO_
>> > >>>>>>>>>> 263M    2020-07-20 09:30        ./pip-install-sxgwqr
>> > >>>>>>>>>> 263M    2020-07-17 13:56        ./pip-install-bWSKIV
>> > >>>>>>>>>> 242M    2020-07-21 20:25        ./beam-pipeline-tempmByU6T
>> > >>>>>>>>>> 242M    2020-07-21 20:21        ./beam-pipeline-tempV85xeK
>> > >>>>>>>>>> 242M    2020-07-21 20:15        ./beam-pipeline-temp7dJROJ
>> > >>>>>>>>>> 236M    2020-07-21 20:25
>> > >>>>>>>>>>  ./beam-pipeline-tempmByU6T/tmpOWj3Yr
>> > >>>>>>>>>> 236M    2020-07-21 20:21
>> > >>>>>>>>>>  ./beam-pipeline-tempV85xeK/tmppbQHB3
>> > >>>>>>>>>> 236M    2020-07-21 20:15
>> > >>>>>>>>>>  ./beam-pipeline-temp7dJROJ/tmpgOXPKW
>> > >>>>>>>>>> 111M    2020-07-23 00:57        ./pip-install-1JnyNE
>> > >>>>>>>>>> 105M    2020-07-23 00:17
>> ./beam-artifact1374651823280819755
>> > >>>>>>>>>> 105M    2020-07-23 00:16
>> ./beam-artifact5050755582921936972
>> > >>>>>>>>>> 105M    2020-07-23 00:16
>> ./beam-artifact1834064452502646289
>> > >>>>>>>>>> 105M    2020-07-23 00:15
>> ./beam-artifact682561790267074916
>> > >>>>>>>>>> 105M    2020-07-23 00:15
>> ./beam-artifact4691304965824489394
>> > >>>>>>>>>> 105M    2020-07-23 00:14
>> ./beam-artifact4050383819822604421
>> > >>>>>>>>>>
>> > >>>>>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <
>> > >>>>>>>>>> rober...@google.com> wrote:
>> > >>>>>>>>>>
>> > >>>>>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <
>> > >>>>>>>>>>> tyso...@google.com> wrote:
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>> Ah I see, thanks Kenn. I found some advice from the Apache
>> > >>>>>>>>>>>> infra wiki that also suggests using a tmpdir inside the
>> workspace [1]:
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> Procedures Projects can take to clean up disk space
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> Projects can help themselves and Infra by taking some basic
>> > >>>>>>>>>>>> steps to help clean up their jobs after themselves on the
>> build nodes.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>    1. Use a ./tmp dir in your jobs workspace. That way it
>> gets
>> > >>>>>>>>>>>>    cleaned up when job workspaces expire.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>> Tests should be (able to be) written to use the standard
>> > >>>>>>>>>>> temporary file mechanisms, and the environment set up on
>> Jenkins such that
>> > >>>>>>>>>>> that falls into the respective workspaces. Ideally this
>> should be as simple
>> > >>>>>>>>>>> as setting the TMPDIR (or similar) environment variable
>> (and making sure it
>> > >>>>>>>>>>> exists/is writable).
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>    1. Configure your jobs to wipe workspaces on start or
>> > >>>>>>>>>>>>    finish.
>> > >>>>>>>>>>>>    2. Configure your jobs to only keep 5 or 10 previous
>> builds.
>> > >>>>>>>>>>>>    3. Configure your jobs to only keep 5 or 10 previous
>> > >>>>>>>>>>>>    artifacts.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> [1]:
>> > >>>>>>>>>>>>
>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <
>> > >>>>>>>>>>>> k...@apache.org> wrote:
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>> Those file listings look like the result of using standard
>> > >>>>>>>>>>>>> temp file APIs but with TMPDIR set to /tmp.
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <
>> > >>>>>>>>>>>>> tyso...@google.com> wrote:
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique
>> > >>>>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look into
>> two examples:
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . |
>> sort
>> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>> > >>>>>>>>>>>>>> 1.6G    2020-07-21 02:25        .
>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:48
>> ./beam-pipeline-temp3ybuY4
>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:46
>> ./beam-pipeline-tempuxjiPT
>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:44
>> ./beam-pipeline-tempVpg1ME
>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:42
>> ./beam-pipeline-tempJ4EpyB
>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:39
>> ./beam-pipeline-tempepea7Q
>> > >>>>>>>>>>>>>> 242M    2020-07-17 18:35
>> ./beam-pipeline-temp79qot2
>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:48
>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:46
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:44
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:42
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:39
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmpWy1vWX
>> > >>>>>>>>>>>>>> 236M    2020-07-17 18:35
>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmpvN7vWA
>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:48
>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmprlh_di
>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:46
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:44
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:42
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:39
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmptYF1v1
>> > >>>>>>>>>>>>>> 3.7M    2020-07-17 18:35
>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmplfV0Rg
>> > >>>>>>>>>>>>>> 2.7M    2020-07-17 20:10        ./pip-install-q9l227ef
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . |
>> sort
>> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>> > >>>>>>>>>>>>>> 817M    2020-07-21 02:26        .
>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:14
>> ./beam-pipeline-tempUTXqlM
>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:11
>> ./beam-pipeline-tempx3Yno3
>> > >>>>>>>>>>>>>> 242M    2020-07-19 12:05
>> ./beam-pipeline-tempyCrMYq
>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:14
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpstXoL0
>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:11
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpnnVn65
>> > >>>>>>>>>>>>>> 236M    2020-07-19 12:05
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpRF0iNs
>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:14
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:11
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmpsmmzqe
>> > >>>>>>>>>>>>>> 3.7M    2020-07-19 12:05
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:14
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmpoj3orz
>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:11
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmptng9sZ
>> > >>>>>>>>>>>>>> 2.0M    2020-07-19 12:05
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpWp6njc
>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:14
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempUTXqlM/tmphgdj35
>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:11
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempx3Yno3/tmp8ySXpm
>> > >>>>>>>>>>>>>> 1.2M    2020-07-19 12:05
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>> > >>>>>>>>>>>>>> 992K    2020-07-12 12:00        ./junit642086915811430564
>> > >>>>>>>>>>>>>> 988K    2020-07-12 12:00
>> ./junit642086915811430564/beam
>> > >>>>>>>>>>>>>> 984K    2020-07-12 12:00
>> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes
>> > >>>>>>>>>>>>>> 980K    2020-07-12 12:00
>> > >>>>>>>>>>>>>>  ./junit642086915811430564/beam/nodes/0
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <
>> eh...@google.com>
>> > >>>>>>>>>>>>>> wrote:
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> You're right, job workspaces should be hermetic.
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <
>> > >>>>>>>>>>>>>>> k...@apache.org> wrote:
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>> I'm probably late to this discussion and missing
>> something,
>> > >>>>>>>>>>>>>>>> but why are we writing to /tmp at all? I would expect
>> TMPDIR to point
>> > >>>>>>>>>>>>>>>> somewhere inside the job directory that will be wiped
>> by Jenkins, and I
>> > >>>>>>>>>>>>>>>> would expect code to always create temp files via APIs
>> that respect this.
>> > >>>>>>>>>>>>>>>> Is Jenkins not cleaning up? Do we not have the ability
>> to set this up? Do
>> > >>>>>>>>>>>>>>>> we have bugs in our code (that we could probably find
>> by setting TMPDIR to
>> > >>>>>>>>>>>>>>>> somewhere not-/tmp and running the tests without write
>> permission to /tmp,
>> > >>>>>>>>>>>>>>>> etc)
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>> Kenn
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <
>> > >>>>>>>>>>>>>>>> al...@google.com> wrote:
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri
>> > >>>>>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue previously
>> (
>> > >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for
>> > >>>>>>>>>>>>>>>>> cleaning up workspace directory after successful
>> jobs. Alternatively, we
>> > >>>>>>>>>>>>>>>>> can consider periodically cleaning up the /src
>> directories.
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> I would suggest moving the cron task from internal
>> cron
>> > >>>>>>>>>>>>>>>>> scripts to the inventory job (
>> > >>>>>>>>>>>>>>>>>
>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51
>> ).
>> > >>>>>>>>>>>>>>>>> That way, we can see all the cron jobs as part of the
>> source tree, adjust
>> > >>>>>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not
>> know how internal cron
>> > >>>>>>>>>>>>>>>>> scripts are created, maintained, and how would they
>> be recreated for new
>> > >>>>>>>>>>>>>>>>> worker instances.
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> /cc +Tyson Hamilton <tyso...@google.com>
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>> > >>>>>>>>>>>>>>>>> damian.gadom...@polidea.com> wrote:
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> Hey,
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> I've recently created a solution for the growing /tmp
>> > >>>>>>>>>>>>>>>>>> directory. Part of it is the job mentioned by Tyson:
>> > >>>>>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not
>> > >>>>>>>>>>>>>>>>>> triggered by cron and should be a last resort
>> solution for some strange
>> > >>>>>>>>>>>>>>>>>> cases.
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> Along with that job, I've also updated every worker
>> with
>> > >>>>>>>>>>>>>>>>>> an internal cron script. It's being executed once a
>> week and deletes all
>> > >>>>>>>>>>>>>>>>>> the files (and only files) that were not accessed
>> for at least three days.
>> > >>>>>>>>>>>>>>>>>> That's designed to be as safe as possible for the
>> running jobs on the
>> > >>>>>>>>>>>>>>>>>> worker (not to delete the files that are still in
>> use), and also to be
>> > >>>>>>>>>>>>>>>>>> insensitive to the current workload on the machine.
>> The cleanup will always
>> > >>>>>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are
>> blocking the machine.
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> I also think that currently the "No space left"
>> errors
>> > >>>>>>>>>>>>>>>>>> may be a consequence of growing workspace directory
>> rather than /tmp. I
>> > >>>>>>>>>>>>>>>>>> didn't do any detailed analysis but e.g. currently,
>> on
>> > >>>>>>>>>>>>>>>>>> apache-beam-jenkins-7 the workspace directory size
>> is 158 GB while /tmp is
>> > >>>>>>>>>>>>>>>>>> only 16 GB. We should either guarantee the disk size
>> to hold workspaces for
>> > >>>>>>>>>>>>>>>>>> all jobs (because eventually, every worker will
>> execute each job) or clear
>> > >>>>>>>>>>>>>>>>>> also the workspaces in some way.
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> Regards,
>> > >>>>>>>>>>>>>>>>>> Damian
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <
>> > >>>>>>>>>>>>>>>>>> m...@apache.org> wrote:
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't
>> lead to
>> > >>>>>>>>>>>>>>>>>>> test failures
>> > >>>>>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe there
>> is
>> > >>>>>>>>>>>>>>>>>>> the notion of
>> > >>>>>>>>>>>>>>>>>>> running exclusively while no other tasks are
>> running?
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>> -Max
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>> > >>>>>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in
>> Jenkins:
>> > >>>>>>>>>>>>>>>>>>> beam_Clean_tmp_directory
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm seeing
>> some
>> > >>>>>>>>>>>>>>>>>>> out of disk related errors in precommit tests
>> currently, perhaps we should
>> > >>>>>>>>>>>>>>>>>>> schedule this job with cron?
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <
>> > >>>>>>>>>>>>>>>>>>> heej...@google.com> wrote:
>> > >>>>>>>>>>>>>>>>>>> >> Still seeing no space left on device errors on
>> > >>>>>>>>>>>>>>>>>>> jenkins-7 (for example:
>> > >>>>>>>>>>>>>>>>>>> >>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>> > >>>>>>>>>>>>>>>>>>> )
>> > >>>>>>>>>>>>>>>>>>> >>
>> > >>>>>>>>>>>>>>>>>>> >>
>> > >>>>>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <
>> > >>>>>>>>>>>>>>>>>>> amyrv...@google.com> wrote:
>> > >>>>>>>>>>>>>>>>>>> >>
>> > >>>>>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by
>> jenkins
>> > >>>>>>>>>>>>>>>>>>> older than 3 days.
>> > >>>>>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution.
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>> >>> Passing recent tests on all executors except
>> > >>>>>>>>>>>>>>>>>>> jenkins-12, which has not
>> > >>>>>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days.
>> Not
>> > >>>>>>>>>>>>>>>>>>> scheduling:
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>> Recent passing builds:
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>>
>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>> > >>>>>>>>>>>>>>>>>>> >>> <
>> > >>>>>>>>>>>>>>>>>>>
>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>> > >>>>>>>>>>>>>>>>>>> >
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <
>> > >>>>>>>>>>>>>>>>>>> al...@google.com> wrote:
>> > >>>>>>>>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>>>>>>>> >>>> +Alan Myrvold <amyrv...@google.com> is doing
>> a one
>> > >>>>>>>>>>>>>>>>>>> time cleanup. I agree
>> > >>>>>>>>>>>>>>>>>>> >>>> that we need to have a solution to automate
>> this
>> > >>>>>>>>>>>>>>>>>>> task or address the root
>> > >>>>>>>>>>>>>>>>>>> >>>> cause of the buildup.
>> > >>>>>>>>>>>>>>>>>>> >>>>
>> > >>>>>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>> > >>>>>>>>>>>>>>>>>>> michal.wale...@polidea.com>
>> > >>>>>>>>>>>>>>>>>>> >>>> wrote:
>> > >>>>>>>>>>>>>>>>>>> >>>>
>> > >>>>>>>>>>>>>>>>>>> >>>>> Hi there,
>> > >>>>>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins
>> workers
>> > >>>>>>>>>>>>>>>>>>> again. Nodes 1 and 7
>> > >>>>>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on device".
>> > >>>>>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in these
>> cases
>> > >>>>>>>>>>>>>>>>>>> (someone with access
>> > >>>>>>>>>>>>>>>>>>> >>>>> permissions to the workers).
>> > >>>>>>>>>>>>>>>>>>> >>>>>
>> > >>>>>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming
>> more
>> > >>>>>>>>>>>>>>>>>>> and more frequent
>> > >>>>>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can this
>> be
>> > >>>>>>>>>>>>>>>>>>> remedied. Can a cleanup
>> > >>>>>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow?
>> > >>>>>>>>>>>>>>>>>>> >>>>>
>> > >>>>>>>>>>>>>>>>>>> >>>>> Regards
>> > >>>>>>>>>>>>>>>>>>> >>>>> Michal
>> > >>>>>>>>>>>>>>>>>>> >>>>>
>> > >>>>>>>>>>>>>>>>>>> >>>>> --
>> > >>>>>>>>>>>>>>>>>>> >>>>>
>> > >>>>>>>>>>>>>>>>>>> >>>>> Michał Walenia
>> > >>>>>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software
>> > >>>>>>>>>>>>>>>>>>> Engineer
>> > >>>>>>>>>>>>>>>>>>> >>>>>
>> > >>>>>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002>
>> <+48%20791%20432%20002> <
>> > >>>>>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002>
>> <+48%20791%20432%20002>>
>> > >>>>>>>>>>>>>>>>>>> >>>>> E: michal.wale...@polidea.com
>> > >>>>>>>>>>>>>>>>>>> >>>>>
>> > >>>>>>>>>>>>>>>>>>> >>>>> Unique Tech
>> > >>>>>>>>>>>>>>>>>>> >>>>> Check out our projects! <
>> > >>>>>>>>>>>>>>>>>>> https://www.polidea.com/our-work>
>> > >>>>>>>>>>>>>>>>>>> >>>>>
>> > >>>>>>>>>>>>>>>>>>> >>>>
>> > >>>>>>>>>>>>>>>>>>> >>
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>
>> >
>>
>

Reply via email to