Those file listings look like the result of using standard temp file APIs but with TMPDIR set to /tmp.
On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <[email protected]> wrote: > Jobs are hermetic as far as I can tell and use unique subdirectories > inside of /tmp. Here is a quick look into two examples: > > @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk 1,1 | > head -n 20 > 1.6G 2020-07-21 02:25 . > 242M 2020-07-17 18:48 ./beam-pipeline-temp3ybuY4 > 242M 2020-07-17 18:46 ./beam-pipeline-tempuxjiPT > 242M 2020-07-17 18:44 ./beam-pipeline-tempVpg1ME > 242M 2020-07-17 18:42 ./beam-pipeline-tempJ4EpyB > 242M 2020-07-17 18:39 ./beam-pipeline-tempepea7Q > 242M 2020-07-17 18:35 ./beam-pipeline-temp79qot2 > 236M 2020-07-17 18:48 ./beam-pipeline-temp3ybuY4/tmpy_Ytzz > 236M 2020-07-17 18:46 ./beam-pipeline-tempuxjiPT/tmpN5_UfJ > 236M 2020-07-17 18:44 ./beam-pipeline-tempVpg1ME/tmpxSm8pX > 236M 2020-07-17 18:42 ./beam-pipeline-tempJ4EpyB/tmpMZJU76 > 236M 2020-07-17 18:39 ./beam-pipeline-tempepea7Q/tmpWy1vWX > 236M 2020-07-17 18:35 ./beam-pipeline-temp79qot2/tmpvN7vWA > 3.7M 2020-07-17 18:48 ./beam-pipeline-temp3ybuY4/tmprlh_di > 3.7M 2020-07-17 18:46 ./beam-pipeline-tempuxjiPT/tmpLmVWfe > 3.7M 2020-07-17 18:44 ./beam-pipeline-tempVpg1ME/tmpvrxbY7 > 3.7M 2020-07-17 18:42 ./beam-pipeline-tempJ4EpyB/tmpLTb6Mj > 3.7M 2020-07-17 18:39 ./beam-pipeline-tempepea7Q/tmptYF1v1 > 3.7M 2020-07-17 18:35 ./beam-pipeline-temp79qot2/tmplfV0Rg > 2.7M 2020-07-17 20:10 ./pip-install-q9l227ef > > > @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk 1,1 | > head -n 20 > 817M 2020-07-21 02:26 . > 242M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM > 242M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3 > 242M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq > 236M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM/tmpstXoL0 > 236M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3/tmpnnVn65 > 236M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq/tmpRF0iNs > 3.7M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM/tmpbJjUAQ > 3.7M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3/tmpsmmzqe > 3.7M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq/tmp5b3ZvY > 2.0M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM/tmpoj3orz > 2.0M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3/tmptng9sZ > 2.0M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq/tmpWp6njc > 1.2M 2020-07-19 12:14 ./beam-pipeline-tempUTXqlM/tmphgdj35 > 1.2M 2020-07-19 12:11 ./beam-pipeline-tempx3Yno3/tmp8ySXpm > 1.2M 2020-07-19 12:05 ./beam-pipeline-tempyCrMYq/tmpNVEJ4e > 992K 2020-07-12 12:00 ./junit642086915811430564 > 988K 2020-07-12 12:00 ./junit642086915811430564/beam > 984K 2020-07-12 12:00 ./junit642086915811430564/beam/nodes > 980K 2020-07-12 12:00 ./junit642086915811430564/beam/nodes/0 > > > > On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri <[email protected]> wrote: > >> You're right, job workspaces should be hermetic. >> >> >> >> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <[email protected]> wrote: >> >>> I'm probably late to this discussion and missing something, but why are >>> we writing to /tmp at all? I would expect TMPDIR to point somewhere inside >>> the job directory that will be wiped by Jenkins, and I would expect code to >>> always create temp files via APIs that respect this. Is Jenkins not >>> cleaning up? Do we not have the ability to set this up? Do we have bugs in >>> our code (that we could probably find by setting TMPDIR to somewhere >>> not-/tmp and running the tests without write permission to /tmp, etc) >>> >>> Kenn >>> >>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <[email protected]> wrote: >>> >>>> Related to workspace directory growth, +Udi Meiri <[email protected]> filed >>>> a relevant issue previously ( >>>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up >>>> workspace directory after successful jobs. Alternatively, we can consider >>>> periodically cleaning up the /src directories. >>>> >>>> I would suggest moving the cron task from internal cron scripts to the >>>> inventory job ( >>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51). >>>> That way, we can see all the cron jobs as part of the source tree, adjust >>>> frequencies and clean up codes with PRs. I do not know how internal cron >>>> scripts are created, maintained, and how would they be recreated for new >>>> worker instances. >>>> >>>> /cc +Tyson Hamilton <[email protected]> >>>> >>>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski < >>>> [email protected]> wrote: >>>> >>>>> Hey, >>>>> >>>>> I've recently created a solution for the growing /tmp directory. Part >>>>> of it is the job mentioned by Tyson: *beam_Clean_tmp_directory*. It's >>>>> intentionally not triggered by cron and should be a last resort solution >>>>> for some strange cases. >>>>> >>>>> Along with that job, I've also updated every worker with an internal >>>>> cron script. It's being executed once a week and deletes all the files >>>>> (and >>>>> only files) that were not accessed for at least three days. That's >>>>> designed >>>>> to be as safe as possible for the running jobs on the worker (not to >>>>> delete >>>>> the files that are still in use), and also to be insensitive to the >>>>> current >>>>> workload on the machine. The cleanup will always happen, even if some >>>>> long-running/stuck jobs are blocking the machine. >>>>> >>>>> I also think that currently the "No space left" errors may be a >>>>> consequence of growing workspace directory rather than /tmp. I didn't do >>>>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the >>>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should >>>>> either guarantee the disk size to hold workspaces for all jobs (because >>>>> eventually, every worker will execute each job) or clear also the >>>>> workspaces in some way. >>>>> >>>>> Regards, >>>>> Damian >>>>> >>>>> >>>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <[email protected]> >>>>> wrote: >>>>> >>>>>> +1 for scheduling it via a cron job if it won't lead to test failures >>>>>> while running. Not a Jenkins expert but maybe there is the notion of >>>>>> running exclusively while no other tasks are running? >>>>>> >>>>>> -Max >>>>>> >>>>>> On 17.07.20 21:49, Tyson Hamilton wrote: >>>>>> > FYI there was a job introduced to do this in Jenkins: >>>>>> beam_Clean_tmp_directory >>>>>> > >>>>>> > Currently it needs to be run manually. I'm seeing some out of disk >>>>>> related errors in precommit tests currently, perhaps we should schedule >>>>>> this job with cron? >>>>>> > >>>>>> > >>>>>> > On 2020/03/11 19:31:13, Heejong Lee <[email protected]> wrote: >>>>>> >> Still seeing no space left on device errors on jenkins-7 (for >>>>>> example: >>>>>> >> >>>>>> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/) >>>>>> >> >>>>>> >> >>>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <[email protected]> >>>>>> wrote: >>>>>> >> >>>>>> >>> Did a one time cleanup of tmp files owned by jenkins older than 3 >>>>>> days. >>>>>> >>> Agree that we need a longer term solution. >>>>>> >>> >>>>>> >>> Passing recent tests on all executors except jenkins-12, which >>>>>> has not >>>>>> >>> scheduled recent builds for the past 13 days. Not scheduling: >>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds >>>>>> >>> < >>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D >>>>>> > >>>>>> >>> Recent passing builds: >>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds >>>>>> >>> < >>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D >>>>>> > >>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds >>>>>> >>> < >>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D >>>>>> > >>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds >>>>>> >>> < >>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D >>>>>> > >>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds >>>>>> >>> < >>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D >>>>>> > >>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds >>>>>> >>> < >>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D >>>>>> > >>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds >>>>>> >>> < >>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D >>>>>> > >>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds >>>>>> >>> < >>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D >>>>>> > >>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds >>>>>> >>> < >>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D >>>>>> > >>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds >>>>>> >>> < >>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D >>>>>> > >>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds >>>>>> >>> < >>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D >>>>>> > >>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds >>>>>> >>> < >>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D >>>>>> > >>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds >>>>>> >>> < >>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D >>>>>> > >>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds >>>>>> >>> < >>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D >>>>>> > >>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds >>>>>> >>> < >>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D >>>>>> > >>>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds >>>>>> >>> < >>>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D >>>>>> > >>>>>> >>> >>>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <[email protected]> >>>>>> wrote: >>>>>> >>> >>>>>> >>>> +Alan Myrvold <[email protected]> is doing a one time >>>>>> cleanup. I agree >>>>>> >>>> that we need to have a solution to automate this task or address >>>>>> the root >>>>>> >>>> cause of the buildup. >>>>>> >>>> >>>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia < >>>>>> [email protected]> >>>>>> >>>> wrote: >>>>>> >>>> >>>>>> >>>>> Hi there, >>>>>> >>>>> it seems we have a problem with Jenkins workers again. Nodes 1 >>>>>> and 7 >>>>>> >>>>> both fail jobs with "No space left on device". >>>>>> >>>>> Who is the best person to contact in these cases (someone with >>>>>> access >>>>>> >>>>> permissions to the workers). >>>>>> >>>>> >>>>>> >>>>> I also noticed that such errors are becoming more and more >>>>>> frequent >>>>>> >>>>> recently and I'd like to discuss how can this be remedied. Can >>>>>> a cleanup >>>>>> >>>>> task be automated on Jenkins somehow? >>>>>> >>>>> >>>>>> >>>>> Regards >>>>>> >>>>> Michal >>>>>> >>>>> >>>>>> >>>>> -- >>>>>> >>>>> >>>>>> >>>>> Michał Walenia >>>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer >>>>>> >>>>> >>>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002 >>>>>> <+48%20791%20432%20002>> >>>>>> >>>>> E: [email protected] >>>>>> >>>>> >>>>>> >>>>> Unique Tech >>>>>> >>>>> Check out our projects! <https://www.polidea.com/our-work> >>>>>> >>>>> >>>>>> >>>> >>>>>> >> >>>>>> >>>>>
