I'm curious to what you find. Was it /tmp or the workspaces using up the space?
On Fri, Jul 24, 2020 at 10:57 AM Tyson Hamilton <[email protected]> wrote: > Bleck. I just realized that it is 'offline' so that won't work. I'll clean > up manually on the machine using the cron command. > > On Fri, Jul 24, 2020 at 10:56 AM Tyson Hamilton <[email protected]> > wrote: > >> Something isn't working with the current set up because node 15 appears >> to be out of space and is currently 'offline' according to Jenkins. Can >> someone run the cleanup job? The machine is full, >> >> @apache-ci-beam-jenkins-15:/tmp$ df -h >> Filesystem Size Used Avail Use% Mounted on >> udev 52G 0 52G 0% /dev >> tmpfs 11G 265M 10G 3% /run >> */dev/sda1 485G 484G 880M 100% /* >> tmpfs 52G 0 52G 0% /dev/shm >> tmpfs 5.0M 0 5.0M 0% /run/lock >> tmpfs 52G 0 52G 0% /sys/fs/cgroup >> tmpfs 11G 0 11G 0% /run/user/1017 >> tmpfs 11G 0 11G 0% /run/user/1037 >> >> apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort -rhk 1,1 | >> head -n 20 >> 20G 2020-07-24 17:52 . >> 580M 2020-07-22 17:31 ./junit1031982597110125586 >> 517M 2020-07-22 17:31 >> ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof >> 517M 2020-07-22 17:31 >> ./junit1031982597110125586/junit8739924829337821410 >> 263M 2020-07-22 12:23 ./pip-install-2GUhO_ >> 263M 2020-07-20 09:30 ./pip-install-sxgwqr >> 263M 2020-07-17 13:56 ./pip-install-bWSKIV >> 242M 2020-07-21 20:25 ./beam-pipeline-tempmByU6T >> 242M 2020-07-21 20:21 ./beam-pipeline-tempV85xeK >> 242M 2020-07-21 20:15 ./beam-pipeline-temp7dJROJ >> 236M 2020-07-21 20:25 ./beam-pipeline-tempmByU6T/tmpOWj3Yr >> 236M 2020-07-21 20:21 ./beam-pipeline-tempV85xeK/tmppbQHB3 >> 236M 2020-07-21 20:15 ./beam-pipeline-temp7dJROJ/tmpgOXPKW >> 111M 2020-07-23 00:57 ./pip-install-1JnyNE >> 105M 2020-07-23 00:17 ./beam-artifact1374651823280819755 >> 105M 2020-07-23 00:16 ./beam-artifact5050755582921936972 >> 105M 2020-07-23 00:16 ./beam-artifact1834064452502646289 >> 105M 2020-07-23 00:15 ./beam-artifact682561790267074916 >> 105M 2020-07-23 00:15 ./beam-artifact4691304965824489394 >> 105M 2020-07-23 00:14 ./beam-artifact4050383819822604421 >> >> On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <[email protected]> >> wrote: >> >>> On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <[email protected]> >>> wrote: >>> >>>> Ah I see, thanks Kenn. I found some advice from the Apache infra wiki >>>> that also suggests using a tmpdir inside the workspace [1]: >>>> >>>> Procedures Projects can take to clean up disk space >>>> >>>> Projects can help themselves and Infra by taking some basic steps to >>>> help clean up their jobs after themselves on the build nodes. >>>> >>>> >>>> >>>> 1. Use a ./tmp dir in your jobs workspace. That way it gets cleaned >>>> up when job workspaces expire. >>>> >>>> >>> Tests should be (able to be) written to use the standard temporary file >>> mechanisms, and the environment set up on Jenkins such that that falls into >>> the respective workspaces. Ideally this should be as simple as setting >>> the TMPDIR (or similar) environment variable (and making sure it exists/is >>> writable). >>> >>>> >>>> 1. Configure your jobs to wipe workspaces on start or finish. >>>> 2. Configure your jobs to only keep 5 or 10 previous builds. >>>> 3. Configure your jobs to only keep 5 or 10 previous artifacts. >>>> >>>> >>>> >>>> [1]: >>>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes >>>> >>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <[email protected]> >>>> wrote: >>>> >>>>> Those file listings look like the result of using standard temp file >>>>> APIs but with TMPDIR set to /tmp. >>>>> >>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <[email protected]> >>>>> wrote: >>>>> >>>>>> Jobs are hermetic as far as I can tell and use unique subdirectories >>>>>> inside of /tmp. Here is a quick look into two examples: >>>>>> >>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk 1,1 >>>>>> | head -n 20 >>>>>> 1.6G 2020-07-21 02:25 . >>>>>> 242M 2020-07-17 18:48 ./beam-pipeline-temp3ybuY4 >>>>>> 242M 2020-07-17 18:46 ./beam-pipeline-tempuxjiPT >>>>>> 242M 2020-07-17 18:44 ./beam-pipeline-tempVpg1ME >>>>>> 242M 2020-07-17 18:42 ./beam-pipeline-tempJ4EpyB >>>>>> 242M 2020-07-17 18:39 ./beam-pipeline-tempepea7Q >>>>>> 242M 2020-07-17 18:35 ./beam-pipeline-temp79qot2 >>>>>> 236M 2020-07-17 18:48 ./beam-pipeline-temp3ybuY4/tmpy_Ytzz >>>>>> 236M 2020-07-17 18:46 ./beam-pipeline-tempuxjiPT/tmpN5_UfJ >>>>>> 236M 2020-07-17 18:44 ./beam-pipeline-tempVpg1ME/tmpxSm8pX >>>>>> 236M 2020-07-17 18:42 ./beam-pipeline-tempJ4EpyB/tmpMZJU76 >>>>>> 236M 2020-07-17 18:39 ./beam-pipeline-tempepea7Q/tmpWy1vWX >>>>>> 236M 2020-07-17 18:35 ./beam-pipeline-temp79qot2/tmpvN7vWA >>>>>> 3.7M 2020-07-17 18:48 ./beam-pipeline-temp3ybuY4/tmprlh_di >>>>>> 3.7M 2020-07-17 18:46 ./beam-pipeline-tempuxjiPT/tmpLmVWfe >>>>>> 3.7M 2020-07-17 18:44 ./beam-pipeline-tempVpg1ME/tmpvrxbY7 >>>>>> 3.7M 2020-07-17 18:42 ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj >>>>>> 3.7M 2020-07-17 18:39 ./beam-pipeline-tempepea7Q/tmptYF1v1 >>>>>> 3.7M 2020-07-17 18:35 ./beam-pipeline-temp79qot2/tmplfV0Rg >>>>>> 2.7M 2020-07-17 20:10 ./pip-install-q9l227ef >>>>>> >>>>>> >>>>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk 1,1 >>>>>> | head -n 20 >>>>>> 817M 2020-07-21 02:26 . >>>>>> 242M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM >>>>>> 242M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3 >>>>>> 242M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq >>>>>> 236M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM/tmpstXoL0 >>>>>> 236M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3/tmpnnVn65 >>>>>> 236M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq/tmpRF0iNs >>>>>> 3.7M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM/tmpbJjUAQ >>>>>> 3.7M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3/tmpsmmzqe >>>>>> 3.7M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq/tmp5b3ZvY >>>>>> 2.0M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM/tmpoj3orz >>>>>> 2.0M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3/tmptng9sZ >>>>>> 2.0M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq/tmpWp6njc >>>>>> 1.2M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM/tmphgdj35 >>>>>> 1.2M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3/tmp8ySXpm >>>>>> 1.2M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq/tmpNVEJ4e >>>>>> 992K 2020-07-12 12:00 ./junit642086915811430564 >>>>>> 988K 2020-07-12 12:00 ./junit642086915811430564/beam >>>>>> 984K 2020-07-12 12:00 ./junit642086915811430564/beam/nodes >>>>>> 980K 2020-07-12 12:00 ./junit642086915811430564/beam/nodes/0 >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <[email protected]> wrote: >>>>>> >>>>>>> You're right, job workspaces should be hermetic. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I'm probably late to this discussion and missing something, but why >>>>>>>> are we writing to /tmp at all? I would expect TMPDIR to point somewhere >>>>>>>> inside the job directory that will be wiped by Jenkins, and I would >>>>>>>> expect >>>>>>>> code to always create temp files via APIs that respect this. Is >>>>>>>> Jenkins not >>>>>>>> cleaning up? Do we not have the ability to set this up? Do we have >>>>>>>> bugs in >>>>>>>> our code (that we could probably find by setting TMPDIR to somewhere >>>>>>>> not-/tmp and running the tests without write permission to /tmp, etc) >>>>>>>> >>>>>>>> Kenn >>>>>>>> >>>>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Related to workspace directory growth, +Udi Meiri >>>>>>>>> <[email protected]> filed a relevant issue previously ( >>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up >>>>>>>>> workspace directory after successful jobs. Alternatively, we can >>>>>>>>> consider >>>>>>>>> periodically cleaning up the /src directories. >>>>>>>>> >>>>>>>>> I would suggest moving the cron task from internal cron scripts to >>>>>>>>> the inventory job ( >>>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51). >>>>>>>>> That way, we can see all the cron jobs as part of the source tree, >>>>>>>>> adjust >>>>>>>>> frequencies and clean up codes with PRs. I do not know how internal >>>>>>>>> cron >>>>>>>>> scripts are created, maintained, and how would they be recreated for >>>>>>>>> new >>>>>>>>> worker instances. >>>>>>>>> >>>>>>>>> /cc +Tyson Hamilton <[email protected]> >>>>>>>>> >>>>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hey, >>>>>>>>>> >>>>>>>>>> I've recently created a solution for the growing /tmp directory. >>>>>>>>>> Part of it is the job mentioned by Tyson: >>>>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not triggered by >>>>>>>>>> cron and should be a last resort solution for some strange cases. >>>>>>>>>> >>>>>>>>>> Along with that job, I've also updated every worker with an >>>>>>>>>> internal cron script. It's being executed once a week and deletes >>>>>>>>>> all the >>>>>>>>>> files (and only files) that were not accessed for at least three >>>>>>>>>> days. >>>>>>>>>> That's designed to be as safe as possible for the running jobs on the >>>>>>>>>> worker (not to delete the files that are still in use), and also to >>>>>>>>>> be >>>>>>>>>> insensitive to the current workload on the machine. The cleanup will >>>>>>>>>> always >>>>>>>>>> happen, even if some long-running/stuck jobs are blocking the >>>>>>>>>> machine. >>>>>>>>>> >>>>>>>>>> I also think that currently the "No space left" errors may be a >>>>>>>>>> consequence of growing workspace directory rather than /tmp. I >>>>>>>>>> didn't do >>>>>>>>>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 >>>>>>>>>> the >>>>>>>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We >>>>>>>>>> should >>>>>>>>>> either guarantee the disk size to hold workspaces for all jobs >>>>>>>>>> (because >>>>>>>>>> eventually, every worker will execute each job) or clear also the >>>>>>>>>> workspaces in some way. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Damian >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> +1 for scheduling it via a cron job if it won't lead to test >>>>>>>>>>> failures >>>>>>>>>>> while running. Not a Jenkins expert but maybe there is the >>>>>>>>>>> notion of >>>>>>>>>>> running exclusively while no other tasks are running? >>>>>>>>>>> >>>>>>>>>>> -Max >>>>>>>>>>> >>>>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote: >>>>>>>>>>> > FYI there was a job introduced to do this in Jenkins: >>>>>>>>>>> beam_Clean_tmp_directory >>>>>>>>>>> > >>>>>>>>>>> > Currently it needs to be run manually. I'm seeing some out of >>>>>>>>>>> disk related errors in precommit tests currently, perhaps we should >>>>>>>>>>> schedule this job with cron? >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >> Still seeing no space left on device errors on jenkins-7 (for >>>>>>>>>>> example: >>>>>>>>>>> >> >>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/ >>>>>>>>>>> ) >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >> >>>>>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins older >>>>>>>>>>> than 3 days. >>>>>>>>>>> >>> Agree that we need a longer term solution. >>>>>>>>>>> >>> >>>>>>>>>>> >>> Passing recent tests on all executors except jenkins-12, >>>>>>>>>>> which has not >>>>>>>>>>> >>> scheduled recent builds for the past 13 days. Not scheduling: >>>>>>>>>>> >>> >>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds >>>>>>>>>>> >>> < >>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D >>>>>>>>>>> > >>>>>>>>>>> >>> Recent passing builds: >>>>>>>>>>> >>> >>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds >>>>>>>>>>> >>> < >>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D >>>>>>>>>>> > >>>>>>>>>>> >>> >>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds >>>>>>>>>>> >>> < >>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D >>>>>>>>>>> > >>>>>>>>>>> >>> >>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds >>>>>>>>>>> >>> < >>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D >>>>>>>>>>> > >>>>>>>>>>> >>> >>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds >>>>>>>>>>> >>> < >>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D >>>>>>>>>>> > >>>>>>>>>>> >>> >>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds >>>>>>>>>>> >>> < >>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D >>>>>>>>>>> > >>>>>>>>>>> >>> >>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds >>>>>>>>>>> >>> < >>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D >>>>>>>>>>> > >>>>>>>>>>> >>> >>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds >>>>>>>>>>> >>> < >>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D >>>>>>>>>>> > >>>>>>>>>>> >>> >>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds >>>>>>>>>>> >>> < >>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D >>>>>>>>>>> > >>>>>>>>>>> >>> >>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds >>>>>>>>>>> >>> < >>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D >>>>>>>>>>> > >>>>>>>>>>> >>> >>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds >>>>>>>>>>> >>> < >>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D >>>>>>>>>>> > >>>>>>>>>>> >>> >>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds >>>>>>>>>>> >>> < >>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D >>>>>>>>>>> > >>>>>>>>>>> >>> >>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds >>>>>>>>>>> >>> < >>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D >>>>>>>>>>> > >>>>>>>>>>> >>> >>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds >>>>>>>>>>> >>> < >>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D >>>>>>>>>>> > >>>>>>>>>>> >>> >>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds >>>>>>>>>>> >>> < >>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D >>>>>>>>>>> > >>>>>>>>>>> >>> >>>>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds >>>>>>>>>>> >>> < >>>>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D >>>>>>>>>>> > >>>>>>>>>>> >>> >>>>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>> >>>>>>>>>>> >>>> +Alan Myrvold <[email protected]> is doing a one time >>>>>>>>>>> cleanup. I agree >>>>>>>>>>> >>>> that we need to have a solution to automate this task or >>>>>>>>>>> address the root >>>>>>>>>>> >>>> cause of the buildup. >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia < >>>>>>>>>>> [email protected]> >>>>>>>>>>> >>>> wrote: >>>>>>>>>>> >>>> >>>>>>>>>>> >>>>> Hi there, >>>>>>>>>>> >>>>> it seems we have a problem with Jenkins workers again. >>>>>>>>>>> Nodes 1 and 7 >>>>>>>>>>> >>>>> both fail jobs with "No space left on device". >>>>>>>>>>> >>>>> Who is the best person to contact in these cases (someone >>>>>>>>>>> with access >>>>>>>>>>> >>>>> permissions to the workers). >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> I also noticed that such errors are becoming more and more >>>>>>>>>>> frequent >>>>>>>>>>> >>>>> recently and I'd like to discuss how can this be remedied. >>>>>>>>>>> Can a cleanup >>>>>>>>>>> >>>>> task be automated on Jenkins somehow? >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> Regards >>>>>>>>>>> >>>>> Michal >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> -- >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> Michał Walenia >>>>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002 >>>>>>>>>>> <+48%20791%20432%20002>> >>>>>>>>>>> >>>>> E: [email protected] >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> Unique Tech >>>>>>>>>>> >>>>> Check out our projects! <https://www.polidea.com/our-work> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>> >>>>>>>>>>> >> >>>>>>>>>>> >>>>>>>>>>
smime.p7s
Description: S/MIME Cryptographic Signature
