Something isn't working with the current set up because node 15 appears to be out of space and is currently 'offline' according to Jenkins. Can someone run the cleanup job? The machine is full,
@apache-ci-beam-jenkins-15:/tmp$ df -h Filesystem Size Used Avail Use% Mounted on udev 52G 0 52G 0% /dev tmpfs 11G 265M 10G 3% /run */dev/sda1 485G 484G 880M 100% /* tmpfs 52G 0 52G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 52G 0 52G 0% /sys/fs/cgroup tmpfs 11G 0 11G 0% /run/user/1017 tmpfs 11G 0 11G 0% /run/user/1037 apache-ci-beam-jenkins-15:/tmp$ sudo du -ah --time . | sort -rhk 1,1 | head -n 20 20G 2020-07-24 17:52 . 580M 2020-07-22 17:31 ./junit1031982597110125586 517M 2020-07-22 17:31 ./junit1031982597110125586/junit8739924829337821410/heap_dump.hprof 517M 2020-07-22 17:31 ./junit1031982597110125586/junit8739924829337821410 263M 2020-07-22 12:23 ./pip-install-2GUhO_ 263M 2020-07-20 09:30 ./pip-install-sxgwqr 263M 2020-07-17 13:56 ./pip-install-bWSKIV 242M 2020-07-21 20:25 ./beam-pipeline-tempmByU6T 242M 2020-07-21 20:21 ./beam-pipeline-tempV85xeK 242M 2020-07-21 20:15 ./beam-pipeline-temp7dJROJ 236M 2020-07-21 20:25 ./beam-pipeline-tempmByU6T/tmpOWj3Yr 236M 2020-07-21 20:21 ./beam-pipeline-tempV85xeK/tmppbQHB3 236M 2020-07-21 20:15 ./beam-pipeline-temp7dJROJ/tmpgOXPKW 111M 2020-07-23 00:57 ./pip-install-1JnyNE 105M 2020-07-23 00:17 ./beam-artifact1374651823280819755 105M 2020-07-23 00:16 ./beam-artifact5050755582921936972 105M 2020-07-23 00:16 ./beam-artifact1834064452502646289 105M 2020-07-23 00:15 ./beam-artifact682561790267074916 105M 2020-07-23 00:15 ./beam-artifact4691304965824489394 105M 2020-07-23 00:14 ./beam-artifact4050383819822604421 On Wed, Jul 22, 2020 at 12:03 PM Robert Bradshaw <rober...@google.com> wrote: > On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton <tyso...@google.com> > wrote: > >> Ah I see, thanks Kenn. I found some advice from the Apache infra wiki >> that also suggests using a tmpdir inside the workspace [1]: >> >> Procedures Projects can take to clean up disk space >> >> Projects can help themselves and Infra by taking some basic steps to help >> clean up their jobs after themselves on the build nodes. >> >> >> >> 1. Use a ./tmp dir in your jobs workspace. That way it gets cleaned >> up when job workspaces expire. >> >> > Tests should be (able to be) written to use the standard temporary file > mechanisms, and the environment set up on Jenkins such that that falls into > the respective workspaces. Ideally this should be as simple as setting > the TMPDIR (or similar) environment variable (and making sure it exists/is > writable). > >> >> 1. Configure your jobs to wipe workspaces on start or finish. >> 2. Configure your jobs to only keep 5 or 10 previous builds. >> 3. Configure your jobs to only keep 5 or 10 previous artifacts. >> >> >> >> [1]: >> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes >> >> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <k...@apache.org> wrote: >> >>> Those file listings look like the result of using standard temp file >>> APIs but with TMPDIR set to /tmp. >>> >>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <tyso...@google.com> >>> wrote: >>> >>>> Jobs are hermetic as far as I can tell and use unique subdirectories >>>> inside of /tmp. Here is a quick look into two examples: >>>> >>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk 1,1 | >>>> head -n 20 >>>> 1.6G 2020-07-21 02:25 . >>>> 242M 2020-07-17 18:48 ./beam-pipeline-temp3ybuY4 >>>> 242M 2020-07-17 18:46 ./beam-pipeline-tempuxjiPT >>>> 242M 2020-07-17 18:44 ./beam-pipeline-tempVpg1ME >>>> 242M 2020-07-17 18:42 ./beam-pipeline-tempJ4EpyB >>>> 242M 2020-07-17 18:39 ./beam-pipeline-tempepea7Q >>>> 242M 2020-07-17 18:35 ./beam-pipeline-temp79qot2 >>>> 236M 2020-07-17 18:48 ./beam-pipeline-temp3ybuY4/tmpy_Ytzz >>>> 236M 2020-07-17 18:46 ./beam-pipeline-tempuxjiPT/tmpN5_UfJ >>>> 236M 2020-07-17 18:44 ./beam-pipeline-tempVpg1ME/tmpxSm8pX >>>> 236M 2020-07-17 18:42 ./beam-pipeline-tempJ4EpyB/tmpMZJU76 >>>> 236M 2020-07-17 18:39 ./beam-pipeline-tempepea7Q/tmpWy1vWX >>>> 236M 2020-07-17 18:35 ./beam-pipeline-temp79qot2/tmpvN7vWA >>>> 3.7M 2020-07-17 18:48 ./beam-pipeline-temp3ybuY4/tmprlh_di >>>> 3.7M 2020-07-17 18:46 ./beam-pipeline-tempuxjiPT/tmpLmVWfe >>>> 3.7M 2020-07-17 18:44 ./beam-pipeline-tempVpg1ME/tmpvrxbY7 >>>> 3.7M 2020-07-17 18:42 ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj >>>> 3.7M 2020-07-17 18:39 ./beam-pipeline-tempepea7Q/tmptYF1v1 >>>> 3.7M 2020-07-17 18:35 ./beam-pipeline-temp79qot2/tmplfV0Rg >>>> 2.7M 2020-07-17 20:10 ./pip-install-q9l227ef >>>> >>>> >>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk 1,1 | >>>> head -n 20 >>>> 817M 2020-07-21 02:26 . >>>> 242M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM >>>> 242M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3 >>>> 242M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq >>>> 236M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM/tmpstXoL0 >>>> 236M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3/tmpnnVn65 >>>> 236M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq/tmpRF0iNs >>>> 3.7M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM/tmpbJjUAQ >>>> 3.7M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3/tmpsmmzqe >>>> 3.7M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq/tmp5b3ZvY >>>> 2.0M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM/tmpoj3orz >>>> 2.0M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3/tmptng9sZ >>>> 2.0M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq/tmpWp6njc >>>> 1.2M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM/tmphgdj35 >>>> 1.2M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3/tmp8ySXpm >>>> 1.2M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq/tmpNVEJ4e >>>> 992K 2020-07-12 12:00 ./junit642086915811430564 >>>> 988K 2020-07-12 12:00 ./junit642086915811430564/beam >>>> 984K 2020-07-12 12:00 ./junit642086915811430564/beam/nodes >>>> 980K 2020-07-12 12:00 ./junit642086915811430564/beam/nodes/0 >>>> >>>> >>>> >>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <eh...@google.com> wrote: >>>> >>>>> You're right, job workspaces should be hermetic. >>>>> >>>>> >>>>> >>>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <k...@apache.org> >>>>> wrote: >>>>> >>>>>> I'm probably late to this discussion and missing something, but why >>>>>> are we writing to /tmp at all? I would expect TMPDIR to point somewhere >>>>>> inside the job directory that will be wiped by Jenkins, and I would >>>>>> expect >>>>>> code to always create temp files via APIs that respect this. Is Jenkins >>>>>> not >>>>>> cleaning up? Do we not have the ability to set this up? Do we have bugs >>>>>> in >>>>>> our code (that we could probably find by setting TMPDIR to somewhere >>>>>> not-/tmp and running the tests without write permission to /tmp, etc) >>>>>> >>>>>> Kenn >>>>>> >>>>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com> >>>>>> wrote: >>>>>> >>>>>>> Related to workspace directory growth, +Udi Meiri <eh...@google.com> >>>>>>> filed >>>>>>> a relevant issue previously ( >>>>>>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up >>>>>>> workspace directory after successful jobs. Alternatively, we can >>>>>>> consider >>>>>>> periodically cleaning up the /src directories. >>>>>>> >>>>>>> I would suggest moving the cron task from internal cron scripts to >>>>>>> the inventory job ( >>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51). >>>>>>> That way, we can see all the cron jobs as part of the source tree, >>>>>>> adjust >>>>>>> frequencies and clean up codes with PRs. I do not know how internal cron >>>>>>> scripts are created, maintained, and how would they be recreated for new >>>>>>> worker instances. >>>>>>> >>>>>>> /cc +Tyson Hamilton <tyso...@google.com> >>>>>>> >>>>>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski < >>>>>>> damian.gadom...@polidea.com> wrote: >>>>>>> >>>>>>>> Hey, >>>>>>>> >>>>>>>> I've recently created a solution for the growing /tmp directory. >>>>>>>> Part of it is the job mentioned by Tyson: >>>>>>>> *beam_Clean_tmp_directory*. It's intentionally not triggered by >>>>>>>> cron and should be a last resort solution for some strange cases. >>>>>>>> >>>>>>>> Along with that job, I've also updated every worker with an >>>>>>>> internal cron script. It's being executed once a week and deletes all >>>>>>>> the >>>>>>>> files (and only files) that were not accessed for at least three days. >>>>>>>> That's designed to be as safe as possible for the running jobs on the >>>>>>>> worker (not to delete the files that are still in use), and also to be >>>>>>>> insensitive to the current workload on the machine. The cleanup will >>>>>>>> always >>>>>>>> happen, even if some long-running/stuck jobs are blocking the machine. >>>>>>>> >>>>>>>> I also think that currently the "No space left" errors may be a >>>>>>>> consequence of growing workspace directory rather than /tmp. I didn't >>>>>>>> do >>>>>>>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the >>>>>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should >>>>>>>> either guarantee the disk size to hold workspaces for all jobs (because >>>>>>>> eventually, every worker will execute each job) or clear also the >>>>>>>> workspaces in some way. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Damian >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <m...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> +1 for scheduling it via a cron job if it won't lead to test >>>>>>>>> failures >>>>>>>>> while running. Not a Jenkins expert but maybe there is the notion >>>>>>>>> of >>>>>>>>> running exclusively while no other tasks are running? >>>>>>>>> >>>>>>>>> -Max >>>>>>>>> >>>>>>>>> On 17.07.20 21:49, Tyson Hamilton wrote: >>>>>>>>> > FYI there was a job introduced to do this in Jenkins: >>>>>>>>> beam_Clean_tmp_directory >>>>>>>>> > >>>>>>>>> > Currently it needs to be run manually. I'm seeing some out of >>>>>>>>> disk related errors in precommit tests currently, perhaps we should >>>>>>>>> schedule this job with cron? >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > On 2020/03/11 19:31:13, Heejong Lee <heej...@google.com> wrote: >>>>>>>>> >> Still seeing no space left on device errors on jenkins-7 (for >>>>>>>>> example: >>>>>>>>> >> >>>>>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/ >>>>>>>>> ) >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold < >>>>>>>>> amyrv...@google.com> wrote: >>>>>>>>> >> >>>>>>>>> >>> Did a one time cleanup of tmp files owned by jenkins older >>>>>>>>> than 3 days. >>>>>>>>> >>> Agree that we need a longer term solution. >>>>>>>>> >>> >>>>>>>>> >>> Passing recent tests on all executors except jenkins-12, which >>>>>>>>> has not >>>>>>>>> >>> scheduled recent builds for the past 13 days. Not scheduling: >>>>>>>>> >>> >>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds >>>>>>>>> >>> < >>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D >>>>>>>>> > >>>>>>>>> >>> Recent passing builds: >>>>>>>>> >>> >>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds >>>>>>>>> >>> < >>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D >>>>>>>>> > >>>>>>>>> >>> >>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds >>>>>>>>> >>> < >>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D >>>>>>>>> > >>>>>>>>> >>> >>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds >>>>>>>>> >>> < >>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D >>>>>>>>> > >>>>>>>>> >>> >>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds >>>>>>>>> >>> < >>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D >>>>>>>>> > >>>>>>>>> >>> >>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds >>>>>>>>> >>> < >>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D >>>>>>>>> > >>>>>>>>> >>> >>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds >>>>>>>>> >>> < >>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D >>>>>>>>> > >>>>>>>>> >>> >>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds >>>>>>>>> >>> < >>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D >>>>>>>>> > >>>>>>>>> >>> >>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds >>>>>>>>> >>> < >>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D >>>>>>>>> > >>>>>>>>> >>> >>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds >>>>>>>>> >>> < >>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D >>>>>>>>> > >>>>>>>>> >>> >>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds >>>>>>>>> >>> < >>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D >>>>>>>>> > >>>>>>>>> >>> >>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds >>>>>>>>> >>> < >>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D >>>>>>>>> > >>>>>>>>> >>> >>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds >>>>>>>>> >>> < >>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D >>>>>>>>> > >>>>>>>>> >>> >>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds >>>>>>>>> >>> < >>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D >>>>>>>>> > >>>>>>>>> >>> >>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds >>>>>>>>> >>> < >>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D >>>>>>>>> > >>>>>>>>> >>> >>>>>>>>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds >>>>>>>>> >>> < >>>>>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D >>>>>>>>> > >>>>>>>>> >>> >>>>>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com> >>>>>>>>> wrote: >>>>>>>>> >>> >>>>>>>>> >>>> +Alan Myrvold <amyrv...@google.com> is doing a one time >>>>>>>>> cleanup. I agree >>>>>>>>> >>>> that we need to have a solution to automate this task or >>>>>>>>> address the root >>>>>>>>> >>>> cause of the buildup. >>>>>>>>> >>>> >>>>>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia < >>>>>>>>> michal.wale...@polidea.com> >>>>>>>>> >>>> wrote: >>>>>>>>> >>>> >>>>>>>>> >>>>> Hi there, >>>>>>>>> >>>>> it seems we have a problem with Jenkins workers again. Nodes >>>>>>>>> 1 and 7 >>>>>>>>> >>>>> both fail jobs with "No space left on device". >>>>>>>>> >>>>> Who is the best person to contact in these cases (someone >>>>>>>>> with access >>>>>>>>> >>>>> permissions to the workers). >>>>>>>>> >>>>> >>>>>>>>> >>>>> I also noticed that such errors are becoming more and more >>>>>>>>> frequent >>>>>>>>> >>>>> recently and I'd like to discuss how can this be remedied. >>>>>>>>> Can a cleanup >>>>>>>>> >>>>> task be automated on Jenkins somehow? >>>>>>>>> >>>>> >>>>>>>>> >>>>> Regards >>>>>>>>> >>>>> Michal >>>>>>>>> >>>>> >>>>>>>>> >>>>> -- >>>>>>>>> >>>>> >>>>>>>>> >>>>> Michał Walenia >>>>>>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer >>>>>>>>> >>>>> >>>>>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002 >>>>>>>>> <+48%20791%20432%20002>> >>>>>>>>> >>>>> E: michal.wale...@polidea.com >>>>>>>>> >>>>> >>>>>>>>> >>>>> Unique Tech >>>>>>>>> >>>>> Check out our projects! <https://www.polidea.com/our-work> >>>>>>>>> >>>>> >>>>>>>>> >>>> >>>>>>>>> >> >>>>>>>>> >>>>>>>>