Just checking - will this wipe out dependency cache? That will slow things down and significantly increase flakiness. If I recall correctly, the default Jenkins layout was:
/home/jenkins/jenkins-slave/workspace/$jobname /home/jenkins/jenkins-slave/workspace/$jobname/.m2 /home/jenkins/jenkins-slave/workspace/$jobname/.git Where you can see that it did a `git clone` right into the root workspace directory, adjacent to .m2. This was not hygienic. One important thing was that `git clean` would wipe the maven cache with every build. So in https://github.com/apache/beam/pull/3976 we changed it to /home/jenkins/jenkins-slave/workspace/$jobname /home/jenkins/jenkins-slave/workspace/$jobname/.m2 /home/jenkins/jenkins-slave/workspace/$jobname/src/.git Now the .m2 directory survives and we do not constantly see flakes re-downloading deps that are immutable. This does, of course, use disk space. That was in the maven days. Gradle is the same except for $HOME/.m2 is replaced by $HOME/.gradle/caches/modules-2/files-2.1. Is Jenkins configured the same way so we will be wiping out the dependencies? If so, can you address this issue? Everything in that directory should be immutable and just a cache to avoid pointless re-download. Kenn On Tue, Jul 28, 2020 at 2:25 AM Damian Gadomski <damian.gadom...@polidea.com> wrote: > Agree with Udi, workspaces seem to be the third culprit, not yet addressed > in any way (until PR#12326 <https://github.com/apache/beam/pull/12326> is > merged). I feel that it'll solve the issue of filling up the disks for a > long time ;) > > I'm also OK with moving /tmp cleanup to option B, and will happily > investigate on proper TMPDIR config. > > > > On Tue, Jul 28, 2020 at 3:07 AM Udi Meiri <eh...@google.com> wrote: > >> What about the workspaces, which can take up 175GB in some cases (see >> above)? >> I'm working on getting them cleaned up automatically: >> https://github.com/apache/beam/pull/12326 >> >> My opinion is that we would get more mileage out of fixing the jobs that >> leave behind files in /tmp and images/containers in Docker. >> This would also help keep development machines clean. >> >> >> On Mon, Jul 27, 2020 at 5:31 PM Tyson Hamilton <tyso...@google.com> >> wrote: >> >>> Here is a summery of how I understand things, >>> >>> - /tmp and /var/lib/docker are the culprit for filling up disks >>> - inventory Jenkins job runs every 12 hours and runs a docker prune to >>> clean up images older than 24hr >>> - crontab on each machine cleans up /tmp files older than three days >>> weekly >>> >>> This doesn't seem to be working since we're still running out of disk >>> periodically and requiring manual intervention. Knobs and options we have >>> available: >>> >>> 1. increase frequency of deleting files >>> 2. decrease the number of days required to delete a file (e.g. older >>> than 2 days) >>> >>> The execution methods we have available are: >>> >>> A. cron >>> - pro: runs even if a job gets stuck in Jenkins due to full disk >>> - con: config baked into VM which is tough to update, not >>> discoverable or documented well >>> B. inventory job >>> - pro: easy to update, runs every 12h already >>> - con: could get stuck if Jenkins agent runs out of disk or is >>> otherwise stuck, tied to all other inventory job frequency >>> C. configure startup scripts for the VMs that set up the cron job >>> anytime the VM is restarted >>> - pro: similar to A. and easy to update >>> - con: similar to A. >>> >>> Between the three I prefer B. because it is consistent with other >>> inventory jobs. If it ends up that stuck jobs prohibit scheduling of the >>> inventory job often we could further investigate C to avoid having to >>> rebuild the VM images repeatedly. >>> >>> Any objections or comments? If not, we'll go forward with B. and reduce >>> the date check from 3 days to 2 days. >>> >>> >>> On 2020/07/24 20:13:29, Ahmet Altay <al...@google.com> wrote: >>> > Tests may not be doing docker cleanup. Inventory job runs a docker >>> prune >>> > every 12 hours for images older than 24 hrs [1]. Randomly looking at >>> one of >>> > the recent runs [2], it cleaned up a long list of containers consuming >>> > 30+GB space. That should be just 12 hours worth of containers. >>> > >>> > [1] >>> > >>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L69 >>> > [2] >>> > >>> https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-14/501/console >>> > >>> > On Fri, Jul 24, 2020 at 1:07 PM Tyson Hamilton <tyso...@google.com> >>> wrote: >>> > >>> > > Yes, these are on the same volume in the /var/lib/docker directory. >>> I'm >>> > > unsure if they clean up leftover images. >>> > > >>> > > On Fri, Jul 24, 2020 at 12:52 PM Udi Meiri <eh...@google.com> wrote: >>> > > >>> > >> I forgot Docker images: >>> > >> >>> > >> ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df >>> > >> TYPE TOTAL ACTIVE SIZE >>> > >> RECLAIMABLE >>> > >> Images 88 9 125.4GB >>> > >> 124.2GB (99%) >>> > >> Containers 40 4 7.927GB >>> > >> 7.871GB (99%) >>> > >> Local Volumes 47 0 3.165GB >>> > >> 3.165GB (100%) >>> > >> Build Cache 0 0 0B >>> > >> 0B >>> > >> >>> > >> There are about 90 images on that machine, with all but 1 less than >>> 48 >>> > >> hours old. >>> > >> I think the docker test jobs need to try harder at cleaning up their >>> > >> leftover images. (assuming they're already doing it?) >>> > >> >>> > >> On Fri, Jul 24, 2020 at 12:31 PM Udi Meiri <eh...@google.com> >>> wrote: >>> > >> >>> > >>> The additional slots (@3 directories) take up even more space now >>> than >>> > >>> before. >>> > >>> >>> > >>> I'm testing out https://github.com/apache/beam/pull/12326 which >>> could >>> > >>> help by cleaning up workspaces after a run (just started a seed >>> job). >>> > >>> >>> > >>> On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton < >>> tyso...@google.com> >>> > >>> wrote: >>> > >>> >>> > >>>> 664M beam_PreCommit_JavaPortabilityApi_Commit >>> > >>>> 656M beam_PreCommit_JavaPortabilityApi_Commit@2 >>> > >>>> 611M beam_PreCommit_JavaPortabilityApi_Cron >>> > >>>> 616M beam_PreCommit_JavaPortabilityApiJava11_Commit >>> > >>>> 598M beam_PreCommit_JavaPortabilityApiJava11_Commit@2 >>> > >>>> 662M beam_PreCommit_JavaPortabilityApiJava11_Cron >>> > >>>> 2.9G beam_PreCommit_Portable_Python_Commit >>> > >>>> 2.9G beam_PreCommit_Portable_Python_Commit@2 >>> > >>>> 1.7G beam_PreCommit_Portable_Python_Commit@3 >>> > >>>> 3.4G beam_PreCommit_Portable_Python_Cron >>> > >>>> 1.9G beam_PreCommit_Python2_PVR_Flink_Commit >>> > >>>> 1.4G beam_PreCommit_Python2_PVR_Flink_Cron >>> > >>>> 1.3G beam_PreCommit_Python2_PVR_Flink_Phrase >>> > >>>> 6.2G beam_PreCommit_Python_Commit >>> > >>>> 7.5G beam_PreCommit_Python_Commit@2 >>> > >>>> 7.5G beam_PreCommit_Python_Cron >>> > >>>> 1012M beam_PreCommit_PythonDocker_Commit >>> > >>>> 1011M beam_PreCommit_PythonDocker_Commit@2 >>> > >>>> 1011M beam_PreCommit_PythonDocker_Commit@3 >>> > >>>> 1002M beam_PreCommit_PythonDocker_Cron >>> > >>>> 877M beam_PreCommit_PythonFormatter_Commit >>> > >>>> 988M beam_PreCommit_PythonFormatter_Cron >>> > >>>> 986M beam_PreCommit_PythonFormatter_Phrase >>> > >>>> 1.7G beam_PreCommit_PythonLint_Commit >>> > >>>> 2.1G beam_PreCommit_PythonLint_Cron >>> > >>>> 7.5G beam_PreCommit_Python_Phrase >>> > >>>> 346M beam_PreCommit_RAT_Commit >>> > >>>> 341M beam_PreCommit_RAT_Cron >>> > >>>> 338M beam_PreCommit_Spotless_Commit >>> > >>>> 339M beam_PreCommit_Spotless_Cron >>> > >>>> 5.5G beam_PreCommit_SQL_Commit >>> > >>>> 5.5G beam_PreCommit_SQL_Cron >>> > >>>> 5.5G beam_PreCommit_SQL_Java11_Commit >>> > >>>> 750M beam_PreCommit_Website_Commit >>> > >>>> 750M beam_PreCommit_Website_Commit@2 >>> > >>>> 750M beam_PreCommit_Website_Cron >>> > >>>> 764M beam_PreCommit_Website_Stage_GCS_Commit >>> > >>>> 771M beam_PreCommit_Website_Stage_GCS_Cron >>> > >>>> 336M beam_Prober_CommunityMetrics >>> > >>>> 693M beam_python_mongoio_load_test >>> > >>>> 339M beam_SeedJob >>> > >>>> 333M beam_SeedJob_Standalone >>> > >>>> 334M beam_sonarqube_report >>> > >>>> 556M beam_SQLBigQueryIO_Batch_Performance_Test_Java >>> > >>>> 175G total >>> > >>>> >>> > >>>> On Fri, Jul 24, 2020 at 12:04 PM Tyson Hamilton < >>> tyso...@google.com> >>> > >>>> wrote: >>> > >>>> >>> > >>>>> Ya looks like something in the workspaces is taking up room: >>> > >>>>> >>> > >>>>> @apache-ci-beam-jenkins-8:/home/jenkins$ sudo du -shc . >>> > >>>>> 191G . >>> > >>>>> 191G total >>> > >>>>> >>> > >>>>> >>> > >>>>> On Fri, Jul 24, 2020 at 11:44 AM Tyson Hamilton < >>> tyso...@google.com> >>> > >>>>> wrote: >>> > >>>>> >>> > >>>>>> Node 8 is also full. The partition that /tmp is on is here: >>> > >>>>>> >>> > >>>>>> Filesystem Size Used Avail Use% Mounted on >>> > >>>>>> /dev/sda1 485G 482G 2.9G 100% / >>> > >>>>>> >>> > >>>>>> however after cleaning up tmp with the crontab command, there >>> is only >>> > >>>>>> 8G usage yet it still remains 100% full: >>> > >>>>>> >>> > >>>>>> @apache-ci-beam-jenkins-8:/tmp$ sudo du -shc /tmp >>> > >>>>>> 8.0G /tmp >>> > >>>>>> 8.0G total >>> > >>>>>> >>> > >>>>>> The workspaces are in the /home/jenkins/jenkins-slave/workspace >>> > >>>>>> directory. When I run a du on that, it takes really long. I'll >>> let it keep >>> > >>>>>> running for a while to see if it ever returns a result but so >>> far this >>> > >>>>>> seems suspect. >>> > >>>>>> >>> > >>>>>> >>> > >>>>>> >>> > >>>>>> >>> > >>>>>> >>> > >>>>>> On Fri, Jul 24, 2020 at 11:19 AM Tyson Hamilton < >>> tyso...@google.com> >>> > >>>>>> wrote: >>> > >>>>>> >>> > >>>>>>> Everything I've been looking at is in the /tmp dir. Where are >>> the >>> > >>>>>>> workspaces, or what are the named? >>> > >>>>>>> >>> > >>>>>>> >>> > >>>>>>> >>> > >>>>>>> >>> > >>>>>>> On Fri, Jul 24, 2020 at 11:03 AM Udi Meiri <eh...@google.com> >>> wrote: >>> > >>>>>>> >>> > >>>>>>>> I'm curious to what you find. Was it /tmp or the workspaces >>> using >>> > >>>>>>>> up the space? >>> > >>>>>>>> >>> > >>>>>>>> On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton < >>> tyso...@google.com> >>> > >>>>>>>> wrote: >>> > >>>>>>>> >>> > >>>>>>>>> Bleck. I just realized that it is 'offline' so that won't >>> work. >>> > >>>>>>>>> I'll clean up manually on the machine using the cron command. >>> > >>>>>>>>> >>> > >>>>>>>>> On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton < >>> > >>>>>>>>> tyso...@google.com> wrote: >>> > >>>>>>>>> >>> > >>>>>>>>>> Something isn't working with the current set up because >>> node 15 >>> > >>>>>>>>>> appears to be out of space and is currently 'offline' >>> according to Jenkins. >>> > >>>>>>>>>> Can someone run the cleanup job? The machine is full, >>> > >>>>>>>>>> >>> > >>>>>>>>>> @apache-ci-beam-jenkins-15:/tmp$ df -h >>> > >>>>>>>>>> Filesystem Size Used Avail Use% Mounted on >>> > >>>>>>>>>> udev 52G 0 52G 0% /dev >>> > >>>>>>>>>> tmpfs 11G 265M 10G 3% /run >>> > >>>>>>>>>> */dev/sda1 485G 484G 880M 100% /* >>> > >>>>>>>>>> tmpfs 52G 0 52G 0% /dev/shm >>> > >>>>>>>>>> tmpfs 5.0M 0 5.0M 0% /run/lock >>> > >>>>>>>>>> tmpfs 52G 0 52G 0% /sys/fs/cgroup >>> > >>>>>>>>>> tmpfs 11G 0 11G 0% /run/user/1017 >>> > >>>>>>>>>> tmpfs 11G 0 11G 0% /run/user/1037 >>> > >>>>>>>>>> >>> > >>>>>>>>>> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort >>> -rhk >>> > >>>>>>>>>> 1,1 | head -n 20 >>> > >>>>>>>>>> 20G 2020-07-24 17:52 . >>> > >>>>>>>>>> 580M 2020-07-22 17:31 ./junit1031982597110125586 >>> > >>>>>>>>>> 517M 2020-07-22 17:31 >>> > >>>>>>>>>> >>> ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof >>> > >>>>>>>>>> 517M 2020-07-22 17:31 >>> > >>>>>>>>>> ./junit1031982597110125586/junit8739924829337821410 >>> > >>>>>>>>>> 263M 2020-07-22 12:23 ./pip-install-2GUhO_ >>> > >>>>>>>>>> 263M 2020-07-20 09:30 ./pip-install-sxgwqr >>> > >>>>>>>>>> 263M 2020-07-17 13:56 ./pip-install-bWSKIV >>> > >>>>>>>>>> 242M 2020-07-21 20:25 ./beam-pipeline-tempmByU6T >>> > >>>>>>>>>> 242M 2020-07-21 20:21 ./beam-pipeline-tempV85xeK >>> > >>>>>>>>>> 242M 2020-07-21 20:15 ./beam-pipeline-temp7dJROJ >>> > >>>>>>>>>> 236M 2020-07-21 20:25 >>> > >>>>>>>>>> ./beam-pipeline-tempmByU6T/tmpOWj3Yr >>> > >>>>>>>>>> 236M 2020-07-21 20:21 >>> > >>>>>>>>>> ./beam-pipeline-tempV85xeK/tmppbQHB3 >>> > >>>>>>>>>> 236M 2020-07-21 20:15 >>> > >>>>>>>>>> ./beam-pipeline-temp7dJROJ/tmpgOXPKW >>> > >>>>>>>>>> 111M 2020-07-23 00:57 ./pip-install-1JnyNE >>> > >>>>>>>>>> 105M 2020-07-23 00:17 >>> ./beam-artifact1374651823280819755 >>> > >>>>>>>>>> 105M 2020-07-23 00:16 >>> ./beam-artifact5050755582921936972 >>> > >>>>>>>>>> 105M 2020-07-23 00:16 >>> ./beam-artifact1834064452502646289 >>> > >>>>>>>>>> 105M 2020-07-23 00:15 >>> ./beam-artifact682561790267074916 >>> > >>>>>>>>>> 105M 2020-07-23 00:15 >>> ./beam-artifact4691304965824489394 >>> > >>>>>>>>>> 105M 2020-07-23 00:14 >>> ./beam-artifact4050383819822604421 >>> > >>>>>>>>>> >>> > >>>>>>>>>> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw < >>> > >>>>>>>>>> rober...@google.com> wrote: >>> > >>>>>>>>>> >>> > >>>>>>>>>>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton < >>> > >>>>>>>>>>> tyso...@google.com> wrote: >>> > >>>>>>>>>>> >>> > >>>>>>>>>>>> Ah I see, thanks Kenn. I found some advice from the Apache >>> > >>>>>>>>>>>> infra wiki that also suggests using a tmpdir inside the >>> workspace [1]: >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>> Procedures Projects can take to clean up disk space >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>> Projects can help themselves and Infra by taking some >>> basic >>> > >>>>>>>>>>>> steps to help clean up their jobs after themselves on the >>> build nodes. >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>> 1. Use a ./tmp dir in your jobs workspace. That way it >>> gets >>> > >>>>>>>>>>>> cleaned up when job workspaces expire. >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>> Tests should be (able to be) written to use the standard >>> > >>>>>>>>>>> temporary file mechanisms, and the environment set up on >>> Jenkins such that >>> > >>>>>>>>>>> that falls into the respective workspaces. Ideally this >>> should be as simple >>> > >>>>>>>>>>> as setting the TMPDIR (or similar) environment variable >>> (and making sure it >>> > >>>>>>>>>>> exists/is writable). >>> > >>>>>>>>>>> >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>> 1. Configure your jobs to wipe workspaces on start or >>> > >>>>>>>>>>>> finish. >>> > >>>>>>>>>>>> 2. Configure your jobs to only keep 5 or 10 previous >>> builds. >>> > >>>>>>>>>>>> 3. Configure your jobs to only keep 5 or 10 previous >>> > >>>>>>>>>>>> artifacts. >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>> [1]: >>> > >>>>>>>>>>>> >>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles < >>> > >>>>>>>>>>>> k...@apache.org> wrote: >>> > >>>>>>>>>>>> >>> > >>>>>>>>>>>>> Those file listings look like the result of using >>> standard >>> > >>>>>>>>>>>>> temp file APIs but with TMPDIR set to /tmp. >>> > >>>>>>>>>>>>> >>> > >>>>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton < >>> > >>>>>>>>>>>>> tyso...@google.com> wrote: >>> > >>>>>>>>>>>>> >>> > >>>>>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique >>> > >>>>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look >>> into two examples: >>> > >>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | >>> sort >>> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20 >>> > >>>>>>>>>>>>>> 1.6G 2020-07-21 02:25 . >>> > >>>>>>>>>>>>>> 242M 2020-07-17 18:48 >>> ./beam-pipeline-temp3ybuY4 >>> > >>>>>>>>>>>>>> 242M 2020-07-17 18:46 >>> ./beam-pipeline-tempuxjiPT >>> > >>>>>>>>>>>>>> 242M 2020-07-17 18:44 >>> ./beam-pipeline-tempVpg1ME >>> > >>>>>>>>>>>>>> 242M 2020-07-17 18:42 >>> ./beam-pipeline-tempJ4EpyB >>> > >>>>>>>>>>>>>> 242M 2020-07-17 18:39 >>> ./beam-pipeline-tempepea7Q >>> > >>>>>>>>>>>>>> 242M 2020-07-17 18:35 >>> ./beam-pipeline-temp79qot2 >>> > >>>>>>>>>>>>>> 236M 2020-07-17 18:48 >>> > >>>>>>>>>>>>>> ./beam-pipeline-temp3ybuY4/tmpy_Ytzz >>> > >>>>>>>>>>>>>> 236M 2020-07-17 18:46 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempuxjiPT/tmpN5_UfJ >>> > >>>>>>>>>>>>>> 236M 2020-07-17 18:44 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempVpg1ME/tmpxSm8pX >>> > >>>>>>>>>>>>>> 236M 2020-07-17 18:42 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempJ4EpyB/tmpMZJU76 >>> > >>>>>>>>>>>>>> 236M 2020-07-17 18:39 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempepea7Q/tmpWy1vWX >>> > >>>>>>>>>>>>>> 236M 2020-07-17 18:35 >>> > >>>>>>>>>>>>>> ./beam-pipeline-temp79qot2/tmpvN7vWA >>> > >>>>>>>>>>>>>> 3.7M 2020-07-17 18:48 >>> > >>>>>>>>>>>>>> ./beam-pipeline-temp3ybuY4/tmprlh_di >>> > >>>>>>>>>>>>>> 3.7M 2020-07-17 18:46 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempuxjiPT/tmpLmVWfe >>> > >>>>>>>>>>>>>> 3.7M 2020-07-17 18:44 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempVpg1ME/tmpvrxbY7 >>> > >>>>>>>>>>>>>> 3.7M 2020-07-17 18:42 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj >>> > >>>>>>>>>>>>>> 3.7M 2020-07-17 18:39 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempepea7Q/tmptYF1v1 >>> > >>>>>>>>>>>>>> 3.7M 2020-07-17 18:35 >>> > >>>>>>>>>>>>>> ./beam-pipeline-temp79qot2/tmplfV0Rg >>> > >>>>>>>>>>>>>> 2.7M 2020-07-17 20:10 ./pip-install-q9l227ef >>> > >>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | >>> sort >>> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20 >>> > >>>>>>>>>>>>>> 817M 2020-07-21 02:26 . >>> > >>>>>>>>>>>>>> 242M 2020-07-19 12:14 >>> ./beam-pipeline-tempUTXqlM >>> > >>>>>>>>>>>>>> 242M 2020-07-19 12:11 >>> ./beam-pipeline-tempx3Yno3 >>> > >>>>>>>>>>>>>> 242M 2020-07-19 12:05 >>> ./beam-pipeline-tempyCrMYq >>> > >>>>>>>>>>>>>> 236M 2020-07-19 12:14 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempUTXqlM/tmpstXoL0 >>> > >>>>>>>>>>>>>> 236M 2020-07-19 12:11 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempx3Yno3/tmpnnVn65 >>> > >>>>>>>>>>>>>> 236M 2020-07-19 12:05 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempyCrMYq/tmpRF0iNs >>> > >>>>>>>>>>>>>> 3.7M 2020-07-19 12:14 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempUTXqlM/tmpbJjUAQ >>> > >>>>>>>>>>>>>> 3.7M 2020-07-19 12:11 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempx3Yno3/tmpsmmzqe >>> > >>>>>>>>>>>>>> 3.7M 2020-07-19 12:05 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempyCrMYq/tmp5b3ZvY >>> > >>>>>>>>>>>>>> 2.0M 2020-07-19 12:14 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempUTXqlM/tmpoj3orz >>> > >>>>>>>>>>>>>> 2.0M 2020-07-19 12:11 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempx3Yno3/tmptng9sZ >>> > >>>>>>>>>>>>>> 2.0M 2020-07-19 12:05 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempyCrMYq/tmpWp6njc >>> > >>>>>>>>>>>>>> 1.2M 2020-07-19 12:14 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempUTXqlM/tmphgdj35 >>> > >>>>>>>>>>>>>> 1.2M 2020-07-19 12:11 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempx3Yno3/tmp8ySXpm >>> > >>>>>>>>>>>>>> 1.2M 2020-07-19 12:05 >>> > >>>>>>>>>>>>>> ./beam-pipeline-tempyCrMYq/tmpNVEJ4e >>> > >>>>>>>>>>>>>> 992K 2020-07-12 12:00 >>> ./junit642086915811430564 >>> > >>>>>>>>>>>>>> 988K 2020-07-12 12:00 >>> ./junit642086915811430564/beam >>> > >>>>>>>>>>>>>> 984K 2020-07-12 12:00 >>> > >>>>>>>>>>>>>> ./junit642086915811430564/beam/nodes >>> > >>>>>>>>>>>>>> 980K 2020-07-12 12:00 >>> > >>>>>>>>>>>>>> ./junit642086915811430564/beam/nodes/0 >>> > >>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri < >>> eh...@google.com> >>> > >>>>>>>>>>>>>> wrote: >>> > >>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>> You're right, job workspaces should be hermetic. >>> > >>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles < >>> > >>>>>>>>>>>>>>> k...@apache.org> wrote: >>> > >>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>> I'm probably late to this discussion and missing >>> something, >>> > >>>>>>>>>>>>>>>> but why are we writing to /tmp at all? I would expect >>> TMPDIR to point >>> > >>>>>>>>>>>>>>>> somewhere inside the job directory that will be wiped >>> by Jenkins, and I >>> > >>>>>>>>>>>>>>>> would expect code to always create temp files via >>> APIs that respect this. >>> > >>>>>>>>>>>>>>>> Is Jenkins not cleaning up? Do we not have the >>> ability to set this up? Do >>> > >>>>>>>>>>>>>>>> we have bugs in our code (that we could probably find >>> by setting TMPDIR to >>> > >>>>>>>>>>>>>>>> somewhere not-/tmp and running the tests without >>> write permission to /tmp, >>> > >>>>>>>>>>>>>>>> etc) >>> > >>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>> Kenn >>> > >>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay < >>> > >>>>>>>>>>>>>>>> al...@google.com> wrote: >>> > >>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>> Related to workspace directory growth, +Udi Meiri >>> > >>>>>>>>>>>>>>>>> <eh...@google.com> filed a relevant issue >>> previously ( >>> > >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for >>> > >>>>>>>>>>>>>>>>> cleaning up workspace directory after successful >>> jobs. Alternatively, we >>> > >>>>>>>>>>>>>>>>> can consider periodically cleaning up the /src >>> directories. >>> > >>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>> I would suggest moving the cron task from internal >>> cron >>> > >>>>>>>>>>>>>>>>> scripts to the inventory job ( >>> > >>>>>>>>>>>>>>>>> >>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51 >>> ). >>> > >>>>>>>>>>>>>>>>> That way, we can see all the cron jobs as part of >>> the source tree, adjust >>> > >>>>>>>>>>>>>>>>> frequencies and clean up codes with PRs. I do not >>> know how internal cron >>> > >>>>>>>>>>>>>>>>> scripts are created, maintained, and how would they >>> be recreated for new >>> > >>>>>>>>>>>>>>>>> worker instances. >>> > >>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>> /cc +Tyson Hamilton <tyso...@google.com> >>> > >>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski < >>> > >>>>>>>>>>>>>>>>> damian.gadom...@polidea.com> wrote: >>> > >>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>> Hey, >>> > >>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>> I've recently created a solution for the growing >>> /tmp >>> > >>>>>>>>>>>>>>>>>> directory. Part of it is the job mentioned by Tyson: >>> > >>>>>>>>>>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not >>> > >>>>>>>>>>>>>>>>>> triggered by cron and should be a last resort >>> solution for some strange >>> > >>>>>>>>>>>>>>>>>> cases. >>> > >>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>> Along with that job, I've also updated every worker >>> with >>> > >>>>>>>>>>>>>>>>>> an internal cron script. It's being executed once a >>> week and deletes all >>> > >>>>>>>>>>>>>>>>>> the files (and only files) that were not accessed >>> for at least three days. >>> > >>>>>>>>>>>>>>>>>> That's designed to be as safe as possible for the >>> running jobs on the >>> > >>>>>>>>>>>>>>>>>> worker (not to delete the files that are still in >>> use), and also to be >>> > >>>>>>>>>>>>>>>>>> insensitive to the current workload on the machine. >>> The cleanup will always >>> > >>>>>>>>>>>>>>>>>> happen, even if some long-running/stuck jobs are >>> blocking the machine. >>> > >>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>> I also think that currently the "No space left" >>> errors >>> > >>>>>>>>>>>>>>>>>> may be a consequence of growing workspace directory >>> rather than /tmp. I >>> > >>>>>>>>>>>>>>>>>> didn't do any detailed analysis but e.g. currently, >>> on >>> > >>>>>>>>>>>>>>>>>> apache-beam-jenkins-7 the workspace directory size >>> is 158 GB while /tmp is >>> > >>>>>>>>>>>>>>>>>> only 16 GB. We should either guarantee the disk >>> size to hold workspaces for >>> > >>>>>>>>>>>>>>>>>> all jobs (because eventually, every worker will >>> execute each job) or clear >>> > >>>>>>>>>>>>>>>>>> also the workspaces in some way. >>> > >>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>> Regards, >>> > >>>>>>>>>>>>>>>>>> Damian >>> > >>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels >>> < >>> > >>>>>>>>>>>>>>>>>> m...@apache.org> wrote: >>> > >>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>>> +1 for scheduling it via a cron job if it won't >>> lead to >>> > >>>>>>>>>>>>>>>>>>> test failures >>> > >>>>>>>>>>>>>>>>>>> while running. Not a Jenkins expert but maybe >>> there is >>> > >>>>>>>>>>>>>>>>>>> the notion of >>> > >>>>>>>>>>>>>>>>>>> running exclusively while no other tasks are >>> running? >>> > >>>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>>> -Max >>> > >>>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote: >>> > >>>>>>>>>>>>>>>>>>> > FYI there was a job introduced to do this in >>> Jenkins: >>> > >>>>>>>>>>>>>>>>>>> beam_Clean_tmp_directory >>> > >>>>>>>>>>>>>>>>>>> > >>> > >>>>>>>>>>>>>>>>>>> > Currently it needs to be run manually. I'm >>> seeing some >>> > >>>>>>>>>>>>>>>>>>> out of disk related errors in precommit tests >>> currently, perhaps we should >>> > >>>>>>>>>>>>>>>>>>> schedule this job with cron? >>> > >>>>>>>>>>>>>>>>>>> > >>> > >>>>>>>>>>>>>>>>>>> > >>> > >>>>>>>>>>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee < >>> > >>>>>>>>>>>>>>>>>>> heej...@google.com> wrote: >>> > >>>>>>>>>>>>>>>>>>> >> Still seeing no space left on device errors on >>> > >>>>>>>>>>>>>>>>>>> jenkins-7 (for example: >>> > >>>>>>>>>>>>>>>>>>> >> >>> > >>>>>>>>>>>>>>>>>>> >>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/ >>> > >>>>>>>>>>>>>>>>>>> ) >>> > >>>>>>>>>>>>>>>>>>> >> >>> > >>>>>>>>>>>>>>>>>>> >> >>> > >>>>>>>>>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold < >>> > >>>>>>>>>>>>>>>>>>> amyrv...@google.com> wrote: >>> > >>>>>>>>>>>>>>>>>>> >> >>> > >>>>>>>>>>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by >>> jenkins >>> > >>>>>>>>>>>>>>>>>>> older than 3 days. >>> > >>>>>>>>>>>>>>>>>>> >>> Agree that we need a longer term solution. >>> > >>>>>>>>>>>>>>>>>>> >>> >>> > >>>>>>>>>>>>>>>>>>> >>> Passing recent tests on all executors except >>> > >>>>>>>>>>>>>>>>>>> jenkins-12, which has not >>> > >>>>>>>>>>>>>>>>>>> >>> scheduled recent builds for the past 13 days. >>> Not >>> > >>>>>>>>>>>>>>>>>>> scheduling: >>> > >>>>>>>>>>>>>>>>>>> >>> >>> > >>>>>>>>>>>>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds >>> > >>>>>>>>>>>>>>>>>>> >>> < >>> > >>>>>>>>>>>>>>>>>>> >>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D >>> > >>>>>>>>>>>>>>>>>>> > >>> > >>>>>>>>>>>>>>>>>>> >>> Recent passing builds: >>> > >>>>>>>>>>>>>>>>>>> >>> >>> > >>>>>>>>>>>>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds >>> > >>>>>>>>>>>>>>>>>>> >>> < >>> > >>>>>>>>>>>>>>>>>>> >>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D >>> > >>>>>>>>>>>>>>>>>>> > >>> > >>>>>>>>>>>>>>>>>>> >>> >>> > >>>>>>>>>>>>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds >>> > >>>>>>>>>>>>>>>>>>> >>> < >>> > >>>>>>>>>>>>>>>>>>> >>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D >>> > >>>>>>>>>>>>>>>>>>> > >>> > >>>>>>>>>>>>>>>>>>> >>> >>> > >>>>>>>>>>>>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds >>> > >>>>>>>>>>>>>>>>>>> >>> < >>> > >>>>>>>>>>>>>>>>>>> >>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D >>> > >>>>>>>>>>>>>>>>>>> > >>> > >>>>>>>>>>>>>>>>>>> >>> >>> > >>>>>>>>>>>>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds >>> > >>>>>>>>>>>>>>>>>>> >>> < >>> > >>>>>>>>>>>>>>>>>>> >>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D >>> > >>>>>>>>>>>>>>>>>>> > >>> > >>>>>>>>>>>>>>>>>>> >>> >>> > >>>>>>>>>>>>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds >>> > >>>>>>>>>>>>>>>>>>> >>> < >>> > >>>>>>>>>>>>>>>>>>> >>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D >>> > >>>>>>>>>>>>>>>>>>> > >>> > >>>>>>>>>>>>>>>>>>> >>> >>> > >>>>>>>>>>>>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds >>> > >>>>>>>>>>>>>>>>>>> >>> < >>> > >>>>>>>>>>>>>>>>>>> >>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D >>> > >>>>>>>>>>>>>>>>>>> > >>> > >>>>>>>>>>>>>>>>>>> >>> >>> > >>>>>>>>>>>>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds >>> > >>>>>>>>>>>>>>>>>>> >>> < >>> > >>>>>>>>>>>>>>>>>>> >>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D >>> > >>>>>>>>>>>>>>>>>>> > >>> > >>>>>>>>>>>>>>>>>>> >>> >>> > >>>>>>>>>>>>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds >>> > >>>>>>>>>>>>>>>>>>> >>> < >>> > >>>>>>>>>>>>>>>>>>> >>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D >>> > >>>>>>>>>>>>>>>>>>> > >>> > >>>>>>>>>>>>>>>>>>> >>> >>> > >>>>>>>>>>>>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds >>> > >>>>>>>>>>>>>>>>>>> >>> < >>> > >>>>>>>>>>>>>>>>>>> >>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D >>> > >>>>>>>>>>>>>>>>>>> > >>> > >>>>>>>>>>>>>>>>>>> >>> >>> > >>>>>>>>>>>>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds >>> > >>>>>>>>>>>>>>>>>>> >>> < >>> > >>>>>>>>>>>>>>>>>>> >>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D >>> > >>>>>>>>>>>>>>>>>>> > >>> > >>>>>>>>>>>>>>>>>>> >>> >>> > >>>>>>>>>>>>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds >>> > >>>>>>>>>>>>>>>>>>> >>> < >>> > >>>>>>>>>>>>>>>>>>> >>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D >>> > >>>>>>>>>>>>>>>>>>> > >>> > >>>>>>>>>>>>>>>>>>> >>> >>> > >>>>>>>>>>>>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds >>> > >>>>>>>>>>>>>>>>>>> >>> < >>> > >>>>>>>>>>>>>>>>>>> >>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D >>> > >>>>>>>>>>>>>>>>>>> > >>> > >>>>>>>>>>>>>>>>>>> >>> >>> > >>>>>>>>>>>>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds >>> > >>>>>>>>>>>>>>>>>>> >>> < >>> > >>>>>>>>>>>>>>>>>>> >>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D >>> > >>>>>>>>>>>>>>>>>>> > >>> > >>>>>>>>>>>>>>>>>>> >>> >>> > >>>>>>>>>>>>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds >>> > >>>>>>>>>>>>>>>>>>> >>> < >>> > >>>>>>>>>>>>>>>>>>> >>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D >>> > >>>>>>>>>>>>>>>>>>> > >>> > >>>>>>>>>>>>>>>>>>> >>> >>> > >>>>>>>>>>>>>>>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds >>> > >>>>>>>>>>>>>>>>>>> >>> < >>> > >>>>>>>>>>>>>>>>>>> >>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D >>> > >>>>>>>>>>>>>>>>>>> > >>> > >>>>>>>>>>>>>>>>>>> >>> >>> > >>>>>>>>>>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay < >>> > >>>>>>>>>>>>>>>>>>> al...@google.com> wrote: >>> > >>>>>>>>>>>>>>>>>>> >>> >>> > >>>>>>>>>>>>>>>>>>> >>>> +Alan Myrvold <amyrv...@google.com> is doing >>> a one >>> > >>>>>>>>>>>>>>>>>>> time cleanup. I agree >>> > >>>>>>>>>>>>>>>>>>> >>>> that we need to have a solution to automate >>> this >>> > >>>>>>>>>>>>>>>>>>> task or address the root >>> > >>>>>>>>>>>>>>>>>>> >>>> cause of the buildup. >>> > >>>>>>>>>>>>>>>>>>> >>>> >>> > >>>>>>>>>>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia >>> < >>> > >>>>>>>>>>>>>>>>>>> michal.wale...@polidea.com> >>> > >>>>>>>>>>>>>>>>>>> >>>> wrote: >>> > >>>>>>>>>>>>>>>>>>> >>>> >>> > >>>>>>>>>>>>>>>>>>> >>>>> Hi there, >>> > >>>>>>>>>>>>>>>>>>> >>>>> it seems we have a problem with Jenkins >>> workers >>> > >>>>>>>>>>>>>>>>>>> again. Nodes 1 and 7 >>> > >>>>>>>>>>>>>>>>>>> >>>>> both fail jobs with "No space left on >>> device". >>> > >>>>>>>>>>>>>>>>>>> >>>>> Who is the best person to contact in these >>> cases >>> > >>>>>>>>>>>>>>>>>>> (someone with access >>> > >>>>>>>>>>>>>>>>>>> >>>>> permissions to the workers). >>> > >>>>>>>>>>>>>>>>>>> >>>>> >>> > >>>>>>>>>>>>>>>>>>> >>>>> I also noticed that such errors are becoming >>> more >>> > >>>>>>>>>>>>>>>>>>> and more frequent >>> > >>>>>>>>>>>>>>>>>>> >>>>> recently and I'd like to discuss how can >>> this be >>> > >>>>>>>>>>>>>>>>>>> remedied. Can a cleanup >>> > >>>>>>>>>>>>>>>>>>> >>>>> task be automated on Jenkins somehow? >>> > >>>>>>>>>>>>>>>>>>> >>>>> >>> > >>>>>>>>>>>>>>>>>>> >>>>> Regards >>> > >>>>>>>>>>>>>>>>>>> >>>>> Michal >>> > >>>>>>>>>>>>>>>>>>> >>>>> >>> > >>>>>>>>>>>>>>>>>>> >>>>> -- >>> > >>>>>>>>>>>>>>>>>>> >>>>> >>> > >>>>>>>>>>>>>>>>>>> >>>>> Michał Walenia >>> > >>>>>>>>>>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | >>> Software >>> > >>>>>>>>>>>>>>>>>>> Engineer >>> > >>>>>>>>>>>>>>>>>>> >>>>> >>> > >>>>>>>>>>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> >>> <+48%20791%20432%20002> < >>> > >>>>>>>>>>>>>>>>>>> +48791432002 <+48%20791%20432%20002> >>> <+48%20791%20432%20002>> >>> > >>>>>>>>>>>>>>>>>>> >>>>> E: michal.wale...@polidea.com >>> > >>>>>>>>>>>>>>>>>>> >>>>> >>> > >>>>>>>>>>>>>>>>>>> >>>>> Unique Tech >>> > >>>>>>>>>>>>>>>>>>> >>>>> Check out our projects! < >>> > >>>>>>>>>>>>>>>>>>> https://www.polidea.com/our-work> >>> > >>>>>>>>>>>>>>>>>>> >>>>> >>> > >>>>>>>>>>>>>>>>>>> >>>> >>> > >>>>>>>>>>>>>>>>>>> >> >>> > >>>>>>>>>>>>>>>>>>> >>> > >>>>>>>>>>>>>>>>>> >>> > >>> >>