Agree with Udi, workspaces seem to be the third culprit, not yet addressed in any way (until PR#12326 <https://github.com/apache/beam/pull/12326> is merged). I feel that it'll solve the issue of filling up the disks for a long time ;)
I'm also OK with moving /tmp cleanup to option B, and will happily investigate on proper TMPDIR config. On Tue, Jul 28, 2020 at 3:07 AM Udi Meiri <eh...@google.com> wrote: > What about the workspaces, which can take up 175GB in some cases (see > above)? > I'm working on getting them cleaned up automatically: > https://github.com/apache/beam/pull/12326 > > My opinion is that we would get more mileage out of fixing the jobs that > leave behind files in /tmp and images/containers in Docker. > This would also help keep development machines clean. > > > On Mon, Jul 27, 2020 at 5:31 PM Tyson Hamilton <tyso...@google.com> wrote: > >> Here is a summery of how I understand things, >> >> - /tmp and /var/lib/docker are the culprit for filling up disks >> - inventory Jenkins job runs every 12 hours and runs a docker prune to >> clean up images older than 24hr >> - crontab on each machine cleans up /tmp files older than three days >> weekly >> >> This doesn't seem to be working since we're still running out of disk >> periodically and requiring manual intervention. Knobs and options we have >> available: >> >> 1. increase frequency of deleting files >> 2. decrease the number of days required to delete a file (e.g. older >> than 2 days) >> >> The execution methods we have available are: >> >> A. cron >> - pro: runs even if a job gets stuck in Jenkins due to full disk >> - con: config baked into VM which is tough to update, not >> discoverable or documented well >> B. inventory job >> - pro: easy to update, runs every 12h already >> - con: could get stuck if Jenkins agent runs out of disk or is >> otherwise stuck, tied to all other inventory job frequency >> C. configure startup scripts for the VMs that set up the cron job >> anytime the VM is restarted >> - pro: similar to A. and easy to update >> - con: similar to A. >> >> Between the three I prefer B. because it is consistent with other >> inventory jobs. If it ends up that stuck jobs prohibit scheduling of the >> inventory job often we could further investigate C to avoid having to >> rebuild the VM images repeatedly. >> >> Any objections or comments? If not, we'll go forward with B. and reduce >> the date check from 3 days to 2 days. >> >> >> On 2020/07/24 20:13:29, Ahmet Altay <al...@google.com> wrote: >> > Tests may not be doing docker cleanup. Inventory job runs a docker prune >> > every 12 hours for images older than 24 hrs [1]. Randomly looking at >> one of >> > the recent runs [2], it cleaned up a long list of containers consuming >> > 30+GB space. That should be just 12 hours worth of containers. >> > >> > [1] >> > >> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L69 >> > [2] >> > >> https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-14/501/console >> > >> > On Fri, Jul 24, 2020 at 1:07 PM Tyson Hamilton <tyso...@google.com> >> wrote: >> > >> > > Yes, these are on the same volume in the /var/lib/docker directory. >> I'm >> > > unsure if they clean up leftover images. >> > > >> > > On Fri, Jul 24, 2020 at 12:52 PM Udi Meiri <eh...@google.com> wrote: >> > > >> > >> I forgot Docker images: >> > >> >> > >> ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df >> > >> TYPE TOTAL ACTIVE SIZE >> > >> RECLAIMABLE >> > >> Images 88 9 125.4GB >> > >> 124.2GB (99%) >> > >> Containers 40 4 7.927GB >> > >> 7.871GB (99%) >> > >> Local Volumes 47 0 3.165GB >> > >> 3.165GB (100%) >> > >> Build Cache 0 0 0B >> > >> 0B >> > >> >> > >> There are about 90 images on that machine, with all but 1 less than >> 48 >> > >> hours old. >> > >> I think the docker test jobs need to try harder at cleaning up their >> > >> leftover images. (assuming they're already doing it?) >> > >> >> > >> On Fri, Jul 24, 2020 at 12:31 PM Udi Meiri <eh...@google.com> wrote: >> > >> >> > >>> The additional slots (@3 directories) take up even more space now >> than >> > >>> before. >> > >>> >> > >>> I'm testing out https://github.com/apache/beam/pull/12326 which >> could >> > >>> help by cleaning up workspaces after a run (just started a seed >> job). >> > >>> >> > >>> On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton <tyso...@google.com >> > >> > >>> wrote: >> > >>> >> > >>>> 664M beam_PreCommit_JavaPortabilityApi_Commit >> > >>>> 656M beam_PreCommit_JavaPortabilityApi_Commit@2 >> > >>>> 611M beam_PreCommit_JavaPortabilityApi_Cron >> > >>>> 616M beam_PreCommit_JavaPortabilityApiJava11_Commit >> > >>>> 598M beam_PreCommit_JavaPortabilityApiJava11_Commit@2 >> > >>>> 662M beam_PreCommit_JavaPortabilityApiJava11_Cron >> > >>>> 2.9G beam_PreCommit_Portable_Python_Commit >> > >>>> 2.9G beam_PreCommit_Portable_Python_Commit@2 >> > >>>> 1.7G beam_PreCommit_Portable_Python_Commit@3 >> > >>>> 3.4G beam_PreCommit_Portable_Python_Cron >> > >>>> 1.9G beam_PreCommit_Python2_PVR_Flink_Commit >> > >>>> 1.4G beam_PreCommit_Python2_PVR_Flink_Cron >> > >>>> 1.3G beam_PreCommit_Python2_PVR_Flink_Phrase >> > >>>> 6.2G beam_PreCommit_Python_Commit >> > >>>> 7.5G beam_PreCommit_Python_Commit@2 >> > >>>> 7.5G beam_PreCommit_Python_Cron >> > >>>> 1012M beam_PreCommit_PythonDocker_Commit >> > >>>> 1011M beam_PreCommit_PythonDocker_Commit@2 >> > >>>> 1011M beam_PreCommit_PythonDocker_Commit@3 >> > >>>> 1002M beam_PreCommit_PythonDocker_Cron >> > >>>> 877M beam_PreCommit_PythonFormatter_Commit >> > >>>> 988M beam_PreCommit_PythonFormatter_Cron >> > >>>> 986M beam_PreCommit_PythonFormatter_Phrase >> > >>>> 1.7G beam_PreCommit_PythonLint_Commit >> > >>>> 2.1G beam_PreCommit_PythonLint_Cron >> > >>>> 7.5G beam_PreCommit_Python_Phrase >> > >>>> 346M beam_PreCommit_RAT_Commit >> > >>>> 341M beam_PreCommit_RAT_Cron >> > >>>> 338M beam_PreCommit_Spotless_Commit >> > >>>> 339M beam_PreCommit_Spotless_Cron >> > >>>> 5.5G beam_PreCommit_SQL_Commit >> > >>>> 5.5G beam_PreCommit_SQL_Cron >> > >>>> 5.5G beam_PreCommit_SQL_Java11_Commit >> > >>>> 750M beam_PreCommit_Website_Commit >> > >>>> 750M beam_PreCommit_Website_Commit@2 >> > >>>> 750M beam_PreCommit_Website_Cron >> > >>>> 764M beam_PreCommit_Website_Stage_GCS_Commit >> > >>>> 771M beam_PreCommit_Website_Stage_GCS_Cron >> > >>>> 336M beam_Prober_CommunityMetrics >> > >>>> 693M beam_python_mongoio_load_test >> > >>>> 339M beam_SeedJob >> > >>>> 333M beam_SeedJob_Standalone >> > >>>> 334M beam_sonarqube_report >> > >>>> 556M beam_SQLBigQueryIO_Batch_Performance_Test_Java >> > >>>> 175G total >> > >>>> >> > >>>> On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton < >> tyso...@google.com> >> > >>>> wrote: >> > >>>> >> > >>>>> Ya looks like something in the workspaces is taking up room: >> > >>>>> >> > >>>>> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc . >> > >>>>> 191G . >> > >>>>> 191G total >> > >>>>> >> > >>>>> >> > >>>>> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton < >> tyso...@google.com> >> > >>>>> wrote: >> > >>>>> >> > >>>>>> Node 8 is also full. The partition that /tmp is on is here: >> > >>>>>> >> > >>>>>> Filesystem Size Used Avail Use% Mounted on >> > >>>>>> /dev/sda1 485G 482G 2.9G 100% / >> > >>>>>> >> > >>>>>> however after cleaning up tmp with the crontab command, there is >> only >> > >>>>>> 8G usage yet it still remains 100% full: >> > >>>>>> >> > >>>>>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp >> > >>>>>> 8.0G /tmp >> > >>>>>> 8.0G total >> > >>>>>> >> > >>>>>> The workspaces are in the /home/jenkins/jenkins-slave/workspace >> > >>>>>> directory. When I run a du on that, it takes really long. I'll >> let it keep >> > >>>>>> running for a while to see if it ever returns a result but so >> far this >> > >>>>>> seems suspect. >> > >>>>>> >> > >>>>>> >> > >>>>>> >> > >>>>>> >> > >>>>>> >> > >>>>>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton < >> tyso...@google.com> >> > >>>>>> wrote: >> > >>>>>> >> > >>>>>>> Everything I've been looking at is in the /tmp dir. Where are >> the >> > >>>>>>> workspaces, or what are the named? >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com> >> wrote: >> > >>>>>>> >> > >>>>>>>> I'm curious to what you find. Was it /tmp or the workspaces >> using >> > >>>>>>>> up the space? >> > >>>>>>>> >> > >>>>>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton < >> tyso...@google.com> >> > >>>>>>>> wrote: >> > >>>>>>>> >> > >>>>>>>>> Bleck. I just realized that it is 'offline' so that won't >> work. >> > >>>>>>>>> I'll clean up manually on the machine using the cron command. >> > >>>>>>>>> >> > >>>>>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton < >> > >>>>>>>>> tyso...@google.com> wrote: >> > >>>>>>>>> >> > >>>>>>>>>> Something isn't working with the current set up because node >> 15 >> > >>>>>>>>>> appears to be out of space and is currently 'offline' >> according to Jenkins. >> > >>>>>>>>>> Can someone run the cleanup job? The machine is full, >> > >>>>>>>>>> >> > >>>>>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h >> > >>>>>>>>>> Filesystem Size Used Avail Use% Mounted on >> > >>>>>>>>>> udev 52G 0 52G 0% /dev >> > >>>>>>>>>> tmpfs 11G 265M 10G 3% /run >> > >>>>>>>>>> */dev/sda1 485G 484G 880M 100% /* >> > >>>>>>>>>> tmpfs 52G 0 52G 0% /dev/shm >> > >>>>>>>>>> tmpfs 5.0M 0 5.0M 0% /run/lock >> > >>>>>>>>>> tmpfs 52G 0 52G 0% /sys/fs/cgroup >> > >>>>>>>>>> tmpfs 11G 0 11G 0% /run/user/1017 >> > >>>>>>>>>> tmpfs 11G 0 11G 0% /run/user/1037 >> > >>>>>>>>>> >> > >>>>>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort >> -rhk >> > >>>>>>>>>> 1,1 | head -n 20 >> > >>>>>>>>>> 20G 2020-07-24 17:52 . >> > >>>>>>>>>> 580M 2020-07-22 17:31 ./junit1031982597110125586 >> > >>>>>>>>>> 517M 2020-07-22 17:31 >> > >>>>>>>>>> >> ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof >> > >>>>>>>>>> 517M 2020-07-22 17:31 >> > >>>>>>>>>> ./junit1031982597110125586/junit8739924829337821410 >> > >>>>>>>>>> 263M 2020-07-22 12:23 ./pip-install-2GUhO_ >> > >>>>>>>>>> 263M 2020-07-20 09:30 ./pip-install-sxgwqr >> > >>>>>>>>>> 263M 2020-07-17 13:56 ./pip-install-bWSKIV >> > >>>>>>>>>> 242M 2020-07-21 20:25 ./beam-pipeline-tempmByU6T >> > >>>>>>>>>> 242M 2020-07-21 20:21 ./beam-pipeline-tempV85xeK >> > >>>>>>>>>> 242M 2020-07-21 20:15 ./beam-pipeline-temp7dJROJ >> > >>>>>>>>>> 236M 2020-07-21 20:25 >> > >>>>>>>>>> ./beam-pipeline-tempmByU6T/tmpOWj3Yr >> > >>>>>>>>>> 236M 2020-07-21 20:21 >> > >>>>>>>>>> ./beam-pipeline-tempV85xeK/tmppbQHB3 >> > >>>>>>>>>> 236M 2020-07-21 20:15 >> > >>>>>>>>>> ./beam-pipeline-temp7dJROJ/tmpgOXPKW >> > >>>>>>>>>> 111M 2020-07-23 00:57 ./pip-install-1JnyNE >> > >>>>>>>>>> 105M 2020-07-23 00:17 >> ./beam-artifact1374651823280819755 >> > >>>>>>>>>> 105M 2020-07-23 00:16 >> ./beam-artifact5050755582921936972 >> > >>>>>>>>>> 105M 2020-07-23 00:16 >> ./beam-artifact1834064452502646289 >> > >>>>>>>>>> 105M 2020-07-23 00:15 >> ./beam-artifact682561790267074916 >> > >>>>>>>>>> 105M 2020-07-23 00:15 >> ./beam-artifact4691304965824489394 >> > >>>>>>>>>> 105M 2020-07-23 00:14 >> ./beam-artifact4050383819822604421 >> > >>>>>>>>>> >> > >>>>>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw < >> > >>>>>>>>>> rober...@google.com> wrote: >> > >>>>>>>>>> >> > >>>>>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton < >> > >>>>>>>>>>> tyso...@google.com> wrote: >> > >>>>>>>>>>> >> > >>>>>>>>>>>> Ah I see, thanks Kenn. I found some advice from the Apache >> > >>>>>>>>>>>> infra wiki that also suggests using a tmpdir inside the >> workspace [1]: >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> Procedures Projects can take to clean up disk space >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> Projects can help themselves and Infra by taking some basic >> > >>>>>>>>>>>> steps to help clean up their jobs after themselves on the >> build nodes. >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> 1. Use a ./tmp dir in your jobs workspace. That way it >> gets >> > >>>>>>>>>>>> cleaned up when job workspaces expire. >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > >>>>>>>>>>> Tests should be (able to be) written to use the standard >> > >>>>>>>>>>> temporary file mechanisms, and the environment set up on >> Jenkins such that >> > >>>>>>>>>>> that falls into the respective workspaces. Ideally this >> should be as simple >> > >>>>>>>>>>> as setting the TMPDIR (or similar) environment variable >> (and making sure it >> > >>>>>>>>>>> exists/is writable). >> > >>>>>>>>>>> >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> 1. Configure your jobs to wipe workspaces on start or >> > >>>>>>>>>>>> finish. >> > >>>>>>>>>>>> 2. Configure your jobs to only keep 5 or 10 previous >> builds. >> > >>>>>>>>>>>> 3. Configure your jobs to only keep 5 or 10 previous >> > >>>>>>>>>>>> artifacts. >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> [1]: >> > >>>>>>>>>>>> >> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles < >> > >>>>>>>>>>>> k...@apache.org> wrote: >> > >>>>>>>>>>>> >> > >>>>>>>>>>>>> Those file listings look like the result of using standard >> > >>>>>>>>>>>>> temp file APIs but with TMPDIR set to /tmp. >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton < >> > >>>>>>>>>>>>> tyso...@google.com> wrote: >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique >> > >>>>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look into >> two examples: >> > >>>>>>>>>>>>>> >> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | >> sort >> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20 >> > >>>>>>>>>>>>>> 1.6G 2020-07-21 02:25 . >> > >>>>>>>>>>>>>> 242M 2020-07-17 18:48 >> ./beam-pipeline-temp3ybuY4 >> > >>>>>>>>>>>>>> 242M 2020-07-17 18:46 >> ./beam-pipeline-tempuxjiPT >> > >>>>>>>>>>>>>> 242M 2020-07-17 18:44 >> ./beam-pipeline-tempVpg1ME >> > >>>>>>>>>>>>>> 242M 2020-07-17 18:42 >> ./beam-pipeline-tempJ4EpyB >> > >>>>>>>>>>>>>> 242M 2020-07-17 18:39 >> ./beam-pipeline-tempepea7Q >> > >>>>>>>>>>>>>> 242M 2020-07-17 18:35 >> ./beam-pipeline-temp79qot2 >> > >>>>>>>>>>>>>> 236M 2020-07-17 18:48 >> > >>>>>>>>>>>>>> ./beam-pipeline-temp3ybuY4/tmpy_Ytzz >> > >>>>>>>>>>>>>> 236M 2020-07-17 18:46 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempuxjiPT/tmpN5_UfJ >> > >>>>>>>>>>>>>> 236M 2020-07-17 18:44 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempVpg1ME/tmpxSm8pX >> > >>>>>>>>>>>>>> 236M 2020-07-17 18:42 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempJ4EpyB/tmpMZJU76 >> > >>>>>>>>>>>>>> 236M 2020-07-17 18:39 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempepea7Q/tmpWy1vWX >> > >>>>>>>>>>>>>> 236M 2020-07-17 18:35 >> > >>>>>>>>>>>>>> ./beam-pipeline-temp79qot2/tmpvN7vWA >> > >>>>>>>>>>>>>> 3.7M 2020-07-17 18:48 >> > >>>>>>>>>>>>>> ./beam-pipeline-temp3ybuY4/tmprlh_di >> > >>>>>>>>>>>>>> 3.7M 2020-07-17 18:46 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempuxjiPT/tmpLmVWfe >> > >>>>>>>>>>>>>> 3.7M 2020-07-17 18:44 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempVpg1ME/tmpvrxbY7 >> > >>>>>>>>>>>>>> 3.7M 2020-07-17 18:42 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj >> > >>>>>>>>>>>>>> 3.7M 2020-07-17 18:39 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempepea7Q/tmptYF1v1 >> > >>>>>>>>>>>>>> 3.7M 2020-07-17 18:35 >> > >>>>>>>>>>>>>> ./beam-pipeline-temp79qot2/tmplfV0Rg >> > >>>>>>>>>>>>>> 2.7M 2020-07-17 20:10 ./pip-install-q9l227ef >> > >>>>>>>>>>>>>> >> > >>>>>>>>>>>>>> >> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | >> sort >> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20 >> > >>>>>>>>>>>>>> 817M 2020-07-21 02:26 . >> > >>>>>>>>>>>>>> 242M 2020-07-19 12:14 >> ./beam-pipeline-tempUTXqlM >> > >>>>>>>>>>>>>> 242M 2020-07-19 12:11 >> ./beam-pipeline-tempx3Yno3 >> > >>>>>>>>>>>>>> 242M 2020-07-19 12:05 >> ./beam-pipeline-tempyCrMYq >> > >>>>>>>>>>>>>> 236M 2020-07-19 12:14 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempUTXqlM/tmpstXoL0 >> > >>>>>>>>>>>>>> 236M 2020-07-19 12:11 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempx3Yno3/tmpnnVn65 >> > >>>>>>>>>>>>>> 236M 2020-07-19 12:05 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempyCrMYq/tmpRF0iNs >> > >>>>>>>>>>>>>> 3.7M 2020-07-19 12:14 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempUTXqlM/tmpbJjUAQ >> > >>>>>>>>>>>>>> 3.7M 2020-07-19 12:11 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempx3Yno3/tmpsmmzqe >> > >>>>>>>>>>>>>> 3.7M 2020-07-19 12:05 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempyCrMYq/tmp5b3ZvY >> > >>>>>>>>>>>>>> 2.0M 2020-07-19 12:14 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempUTXqlM/tmpoj3orz >> > >>>>>>>>>>>>>> 2.0M 2020-07-19 12:11 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempx3Yno3/tmptng9sZ >> > >>>>>>>>>>>>>> 2.0M 2020-07-19 12:05 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempyCrMYq/tmpWp6njc >> > >>>>>>>>>>>>>> 1.2M 2020-07-19 12:14 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempUTXqlM/tmphgdj35 >> > >>>>>>>>>>>>>> 1.2M 2020-07-19 12:11 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempx3Yno3/tmp8ySXpm >> > >>>>>>>>>>>>>> 1.2M 2020-07-19 12:05 >> > >>>>>>>>>>>>>> ./beam-pipeline-tempyCrMYq/tmpNVEJ4e >> > >>>>>>>>>>>>>> 992K 2020-07-12 12:00 ./junit642086915811430564 >> > >>>>>>>>>>>>>> 988K 2020-07-12 12:00 >> ./junit642086915811430564/beam >> > >>>>>>>>>>>>>> 984K 2020-07-12 12:00 >> > >>>>>>>>>>>>>> ./junit642086915811430564/beam/nodes >> > >>>>>>>>>>>>>> 980K 2020-07-12 12:00 >> > >>>>>>>>>>>>>> ./junit642086915811430564/beam/nodes/0 >> > >>>>>>>>>>>>>> >> > >>>>>>>>>>>>>> >> > >>>>>>>>>>>>>> >> > >>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri < >> eh...@google.com> >> > >>>>>>>>>>>>>> wrote: >> > >>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>> You're right, job workspaces should be hermetic. >> > >>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles < >> > >>>>>>>>>>>>>>> k...@apache.org> wrote: >> > >>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> I'm probably late to this discussion and missing >> something, >> > >>>>>>>>>>>>>>>> but why are we writing to /tmp at all? I would expect >> TMPDIR to point >> > >>>>>>>>>>>>>>>> somewhere inside the job directory that will be wiped >> by Jenkins, and I >> > >>>>>>>>>>>>>>>> would expect code to always create temp files via APIs >> that respect this. >> > >>>>>>>>>>>>>>>> Is Jenkins not cleaning up? Do we not have the ability >> to set this up? Do >> > >>>>>>>>>>>>>>>> we have bugs in our code (that we could probably find >> by setting TMPDIR to >> > >>>>>>>>>>>>>>>> somewhere not-/tmp and running the tests without write >> permission to /tmp, >> > >>>>>>>>>>>>>>>> etc) >> > >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> Kenn >> > >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay < >> > >>>>>>>>>>>>>>>> al...@google.com> wrote: >> > >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri >> > >>>>>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue previously >> ( >> > >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for >> > >>>>>>>>>>>>>>>>> cleaning up workspace directory after successful >> jobs. Alternatively, we >> > >>>>>>>>>>>>>>>>> can consider periodically cleaning up the /src >> directories. >> > >>>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>> I would suggest moving the cron task from internal >> cron >> > >>>>>>>>>>>>>>>>> scripts to the inventory job ( >> > >>>>>>>>>>>>>>>>> >> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51 >> ). >> > >>>>>>>>>>>>>>>>> That way, we can see all the cron jobs as part of the >> source tree, adjust >> > >>>>>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not >> know how internal cron >> > >>>>>>>>>>>>>>>>> scripts are created, maintained, and how would they >> be recreated for new >> > >>>>>>>>>>>>>>>>> worker instances. >> > >>>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>> /cc +Tyson Hamilton <tyso...@google.com> >> > >>>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski < >> > >>>>>>>>>>>>>>>>> damian.gadom...@polidea.com> wrote: >> > >>>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>>> Hey, >> > >>>>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>>> I've recently created a solution for the growing /tmp >> > >>>>>>>>>>>>>>>>>> directory. Part of it is the job mentioned by Tyson: >> > >>>>>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not >> > >>>>>>>>>>>>>>>>>> triggered by cron and should be a last resort >> solution for some strange >> > >>>>>>>>>>>>>>>>>> cases. >> > >>>>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>>> Along with that job, I've also updated every worker >> with >> > >>>>>>>>>>>>>>>>>> an internal cron script. It's being executed once a >> week and deletes all >> > >>>>>>>>>>>>>>>>>> the files (and only files) that were not accessed >> for at least three days. >> > >>>>>>>>>>>>>>>>>> That's designed to be as safe as possible for the >> running jobs on the >> > >>>>>>>>>>>>>>>>>> worker (not to delete the files that are still in >> use), and also to be >> > >>>>>>>>>>>>>>>>>> insensitive to the current workload on the machine. >> The cleanup will always >> > >>>>>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are >> blocking the machine. >> > >>>>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>>> I also think that currently the "No space left" >> errors >> > >>>>>>>>>>>>>>>>>> may be a consequence of growing workspace directory >> rather than /tmp. I >> > >>>>>>>>>>>>>>>>>> didn't do any detailed analysis but e.g. currently, >> on >> > >>>>>>>>>>>>>>>>>> apache-beam-jenkins-7 the workspace directory size >> is 158 GB while /tmp is >> > >>>>>>>>>>>>>>>>>> only 16 GB. We should either guarantee the disk size >> to hold workspaces for >> > >>>>>>>>>>>>>>>>>> all jobs (because eventually, every worker will >> execute each job) or clear >> > >>>>>>>>>>>>>>>>>> also the workspaces in some way. >> > >>>>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>>> Regards, >> > >>>>>>>>>>>>>>>>>> Damian >> > >>>>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels < >> > >>>>>>>>>>>>>>>>>> m...@apache.org> wrote: >> > >>>>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't >> lead to >> > >>>>>>>>>>>>>>>>>>> test failures >> > >>>>>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe there >> is >> > >>>>>>>>>>>>>>>>>>> the notion of >> > >>>>>>>>>>>>>>>>>>> running exclusively while no other tasks are >> running? >> > >>>>>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>>>> -Max >> > >>>>>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote: >> > >>>>>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in >> Jenkins: >> > >>>>>>>>>>>>>>>>>>> beam_Clean_tmp_directory >> > >>>>>>>>>>>>>>>>>>> > >> > >>>>>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm seeing >> some >> > >>>>>>>>>>>>>>>>>>> out of disk related errors in precommit tests >> currently, perhaps we should >> > >>>>>>>>>>>>>>>>>>> schedule this job with cron? >> > >>>>>>>>>>>>>>>>>>> > >> > >>>>>>>>>>>>>>>>>>> > >> > >>>>>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee < >> > >>>>>>>>>>>>>>>>>>> heej...@google.com> wrote: >> > >>>>>>>>>>>>>>>>>>> >> Still seeing no space left on device errors on >> > >>>>>>>>>>>>>>>>>>> jenkins-7 (for example: >> > >>>>>>>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>>>>>>> >> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/ >> > >>>>>>>>>>>>>>>>>>> ) >> > >>>>>>>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold < >> > >>>>>>>>>>>>>>>>>>> amyrv...@google.com> wrote: >> > >>>>>>>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by >> jenkins >> > >>>>>>>>>>>>>>>>>>> older than 3 days. >> > >>>>>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution. >> > >>>>>>>>>>>>>>>>>>> >>> >> > >>>>>>>>>>>>>>>>>>> >>> Passing recent tests on all executors except >> > >>>>>>>>>>>>>>>>>>> jenkins-12, which has not >> > >>>>>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days. >> Not >> > >>>>>>>>>>>>>>>>>>> scheduling: >> > >>>>>>>>>>>>>>>>>>> >>> >> > >>>>>>>>>>>>>>>>>>> >> https://builds.apache.org/computer/apache-beam-jenkins-12/builds >> > >>>>>>>>>>>>>>>>>>> >>> < >> > >>>>>>>>>>>>>>>>>>> >> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D >> > >>>>>>>>>>>>>>>>>>> > >> > >>>>>>>>>>>>>>>>>>> >>> Recent passing builds: >> > >>>>>>>>>>>>>>>>>>> >>> >> > >>>>>>>>>>>>>>>>>>> >> https://builds.apache.org/computer/apache-beam-jenkins-1/builds >> > >>>>>>>>>>>>>>>>>>> >>> < >> > >>>>>>>>>>>>>>>>>>> >> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D >> > >>>>>>>>>>>>>>>>>>> > >> > >>>>>>>>>>>>>>>>>>> >>> >> > >>>>>>>>>>>>>>>>>>> >> https://builds.apache.org/computer/apache-beam-jenkins-2/builds >> > >>>>>>>>>>>>>>>>>>> >>> < >> > >>>>>>>>>>>>>>>>>>> >> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D >> > >>>>>>>>>>>>>>>>>>> > >> > >>>>>>>>>>>>>>>>>>> >>> >> > >>>>>>>>>>>>>>>>>>> >> https://builds.apache.org/computer/apache-beam-jenkins-3/builds >> > >>>>>>>>>>>>>>>>>>> >>> < >> > >>>>>>>>>>>>>>>>>>> >> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D >> > >>>>>>>>>>>>>>>>>>> > >> > >>>>>>>>>>>>>>>>>>> >>> >> > >>>>>>>>>>>>>>>>>>> >> https://builds.apache.org/computer/apache-beam-jenkins-4/builds >> > >>>>>>>>>>>>>>>>>>> >>> < >> > >>>>>>>>>>>>>>>>>>> >> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D >> > >>>>>>>>>>>>>>>>>>> > >> > >>>>>>>>>>>>>>>>>>> >>> >> > >>>>>>>>>>>>>>>>>>> >> https://builds.apache.org/computer/apache-beam-jenkins-5/builds >> > >>>>>>>>>>>>>>>>>>> >>> < >> > >>>>>>>>>>>>>>>>>>> >> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D >> > >>>>>>>>>>>>>>>>>>> > >> > >>>>>>>>>>>>>>>>>>> >>> >> > >>>>>>>>>>>>>>>>>>> >> https://builds.apache.org/computer/apache-beam-jenkins-6/builds >> > >>>>>>>>>>>>>>>>>>> >>> < >> > >>>>>>>>>>>>>>>>>>> >> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D >> > >>>>>>>>>>>>>>>>>>> > >> > >>>>>>>>>>>>>>>>>>> >>> >> > >>>>>>>>>>>>>>>>>>> >> https://builds.apache.org/computer/apache-beam-jenkins-7/builds >> > >>>>>>>>>>>>>>>>>>> >>> < >> > >>>>>>>>>>>>>>>>>>> >> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D >> > >>>>>>>>>>>>>>>>>>> > >> > >>>>>>>>>>>>>>>>>>> >>> >> > >>>>>>>>>>>>>>>>>>> >> https://builds.apache.org/computer/apache-beam-jenkins-8/builds >> > >>>>>>>>>>>>>>>>>>> >>> < >> > >>>>>>>>>>>>>>>>>>> >> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D >> > >>>>>>>>>>>>>>>>>>> > >> > >>>>>>>>>>>>>>>>>>> >>> >> > >>>>>>>>>>>>>>>>>>> >> https://builds.apache.org/computer/apache-beam-jenkins-9/builds >> > >>>>>>>>>>>>>>>>>>> >>> < >> > >>>>>>>>>>>>>>>>>>> >> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D >> > >>>>>>>>>>>>>>>>>>> > >> > >>>>>>>>>>>>>>>>>>> >>> >> > >>>>>>>>>>>>>>>>>>> >> https://builds.apache.org/computer/apache-beam-jenkins-10/builds >> > >>>>>>>>>>>>>>>>>>> >>> < >> > >>>>>>>>>>>>>>>>>>> >> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D >> > >>>>>>>>>>>>>>>>>>> > >> > >>>>>>>>>>>>>>>>>>> >>> >> > >>>>>>>>>>>>>>>>>>> >> https://builds.apache.org/computer/apache-beam-jenkins-11/builds >> > >>>>>>>>>>>>>>>>>>> >>> < >> > >>>>>>>>>>>>>>>>>>> >> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D >> > >>>>>>>>>>>>>>>>>>> > >> > >>>>>>>>>>>>>>>>>>> >>> >> > >>>>>>>>>>>>>>>>>>> >> https://builds.apache.org/computer/apache-beam-jenkins-13/builds >> > >>>>>>>>>>>>>>>>>>> >>> < >> > >>>>>>>>>>>>>>>>>>> >> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D >> > >>>>>>>>>>>>>>>>>>> > >> > >>>>>>>>>>>>>>>>>>> >>> >> > >>>>>>>>>>>>>>>>>>> >> https://builds.apache.org/computer/apache-beam-jenkins-14/builds >> > >>>>>>>>>>>>>>>>>>> >>> < >> > >>>>>>>>>>>>>>>>>>> >> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D >> > >>>>>>>>>>>>>>>>>>> > >> > >>>>>>>>>>>>>>>>>>> >>> >> > >>>>>>>>>>>>>>>>>>> >> https://builds.apache.org/computer/apache-beam-jenkins-15/builds >> > >>>>>>>>>>>>>>>>>>> >>> < >> > >>>>>>>>>>>>>>>>>>> >> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D >> > >>>>>>>>>>>>>>>>>>> > >> > >>>>>>>>>>>>>>>>>>> >>> >> > >>>>>>>>>>>>>>>>>>> >> https://builds.apache.org/computer/apache-beam-jenkins-16/builds >> > >>>>>>>>>>>>>>>>>>> >>> < >> > >>>>>>>>>>>>>>>>>>> >> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D >> > >>>>>>>>>>>>>>>>>>> > >> > >>>>>>>>>>>>>>>>>>> >>> >> > >>>>>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay < >> > >>>>>>>>>>>>>>>>>>> al...@google.com> wrote: >> > >>>>>>>>>>>>>>>>>>> >>> >> > >>>>>>>>>>>>>>>>>>> >>>> +Alan Myrvold <amyrv...@google.com> is doing >> a one >> > >>>>>>>>>>>>>>>>>>> time cleanup. I agree >> > >>>>>>>>>>>>>>>>>>> >>>> that we need to have a solution to automate >> this >> > >>>>>>>>>>>>>>>>>>> task or address the root >> > >>>>>>>>>>>>>>>>>>> >>>> cause of the buildup. >> > >>>>>>>>>>>>>>>>>>> >>>> >> > >>>>>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia < >> > >>>>>>>>>>>>>>>>>>> michal.wale...@polidea.com> >> > >>>>>>>>>>>>>>>>>>> >>>> wrote: >> > >>>>>>>>>>>>>>>>>>> >>>> >> > >>>>>>>>>>>>>>>>>>> >>>>> Hi there, >> > >>>>>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins >> workers >> > >>>>>>>>>>>>>>>>>>> again. Nodes 1 and 7 >> > >>>>>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on device". >> > >>>>>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in these >> cases >> > >>>>>>>>>>>>>>>>>>> (someone with access >> > >>>>>>>>>>>>>>>>>>> >>>>> permissions to the workers). >> > >>>>>>>>>>>>>>>>>>> >>>>> >> > >>>>>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming >> more >> > >>>>>>>>>>>>>>>>>>> and more frequent >> > >>>>>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can this >> be >> > >>>>>>>>>>>>>>>>>>> remedied. Can a cleanup >> > >>>>>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow? >> > >>>>>>>>>>>>>>>>>>> >>>>> >> > >>>>>>>>>>>>>>>>>>> >>>>> Regards >> > >>>>>>>>>>>>>>>>>>> >>>>> Michal >> > >>>>>>>>>>>>>>>>>>> >>>>> >> > >>>>>>>>>>>>>>>>>>> >>>>> -- >> > >>>>>>>>>>>>>>>>>>> >>>>> >> > >>>>>>>>>>>>>>>>>>> >>>>> Michał Walenia >> > >>>>>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software >> > >>>>>>>>>>>>>>>>>>> Engineer >> > >>>>>>>>>>>>>>>>>>> >>>>> >> > >>>>>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> >> <+48%20791%20432%20002> < >> > >>>>>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002> >> <+48%20791%20432%20002>> >> > >>>>>>>>>>>>>>>>>>> >>>>> E: michal.wale...@polidea.com >> > >>>>>>>>>>>>>>>>>>> >>>>> >> > >>>>>>>>>>>>>>>>>>> >>>>> Unique Tech >> > >>>>>>>>>>>>>>>>>>> >>>>> Check out our projects! < >> > >>>>>>>>>>>>>>>>>>> https://www.polidea.com/our-work> >> > >>>>>>>>>>>>>>>>>>> >>>>> >> > >>>>>>>>>>>>>>>>>>> >>>> >> > >>>>>>>>>>>>>>>>>>> >> >> > >>>>>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>>>> >> > >> >