You're right, job workspaces should be hermetic.
On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <k...@apache.org> wrote: > I'm probably late to this discussion and missing something, but why are we > writing to /tmp at all? I would expect TMPDIR to point somewhere inside the > job directory that will be wiped by Jenkins, and I would expect code to > always create temp files via APIs that respect this. Is Jenkins not > cleaning up? Do we not have the ability to set this up? Do we have bugs in > our code (that we could probably find by setting TMPDIR to somewhere > not-/tmp and running the tests without write permission to /tmp, etc) > > Kenn > > On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com> wrote: > >> Related to workspace directory growth, +Udi Meiri <eh...@google.com> filed >> a relevant issue previously ( >> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up >> workspace directory after successful jobs. Alternatively, we can consider >> periodically cleaning up the /src directories. >> >> I would suggest moving the cron task from internal cron scripts to the >> inventory job ( >> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51). >> That way, we can see all the cron jobs as part of the source tree, adjust >> frequencies and clean up codes with PRs. I do not know how internal cron >> scripts are created, maintained, and how would they be recreated for new >> worker instances. >> >> /cc +Tyson Hamilton <tyso...@google.com> >> >> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski < >> damian.gadom...@polidea.com> wrote: >> >>> Hey, >>> >>> I've recently created a solution for the growing /tmp directory. Part of >>> it is the job mentioned by Tyson: *beam_Clean_tmp_directory*. It's >>> intentionally not triggered by cron and should be a last resort solution >>> for some strange cases. >>> >>> Along with that job, I've also updated every worker with an internal >>> cron script. It's being executed once a week and deletes all the files (and >>> only files) that were not accessed for at least three days. That's designed >>> to be as safe as possible for the running jobs on the worker (not to delete >>> the files that are still in use), and also to be insensitive to the current >>> workload on the machine. The cleanup will always happen, even if some >>> long-running/stuck jobs are blocking the machine. >>> >>> I also think that currently the "No space left" errors may be a >>> consequence of growing workspace directory rather than /tmp. I didn't do >>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the >>> workspace directory size is 158 GB while /tmp is only 16 GB. We should >>> either guarantee the disk size to hold workspaces for all jobs (because >>> eventually, every worker will execute each job) or clear also the >>> workspaces in some way. >>> >>> Regards, >>> Damian >>> >>> >>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <m...@apache.org> >>> wrote: >>> >>>> +1 for scheduling it via a cron job if it won't lead to test failures >>>> while running. Not a Jenkins expert but maybe there is the notion of >>>> running exclusively while no other tasks are running? >>>> >>>> -Max >>>> >>>> On 17.07.20 21:49, Tyson Hamilton wrote: >>>> > FYI there was a job introduced to do this in Jenkins: >>>> beam_Clean_tmp_directory >>>> > >>>> > Currently it needs to be run manually. I'm seeing some out of disk >>>> related errors in precommit tests currently, perhaps we should schedule >>>> this job with cron? >>>> > >>>> > >>>> > On 2020/03/11 19:31:13, Heejong Lee <heej...@google.com> wrote: >>>> >> Still seeing no space left on device errors on jenkins-7 (for >>>> example: >>>> >> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/ >>>> ) >>>> >> >>>> >> >>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <amyrv...@google.com> >>>> wrote: >>>> >> >>>> >>> Did a one time cleanup of tmp files owned by jenkins older than 3 >>>> days. >>>> >>> Agree that we need a longer term solution. >>>> >>> >>>> >>> Passing recent tests on all executors except jenkins-12, which has >>>> not >>>> >>> scheduled recent builds for the past 13 days. Not scheduling: >>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds >>>> >>> < >>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D >>>> > >>>> >>> Recent passing builds: >>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds >>>> >>> < >>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D >>>> > >>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds >>>> >>> < >>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D >>>> > >>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds >>>> >>> < >>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D >>>> > >>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds >>>> >>> < >>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D >>>> > >>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds >>>> >>> < >>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D >>>> > >>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds >>>> >>> < >>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D >>>> > >>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds >>>> >>> < >>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D >>>> > >>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds >>>> >>> < >>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D >>>> > >>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds >>>> >>> < >>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D >>>> > >>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds >>>> >>> < >>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D >>>> > >>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds >>>> >>> < >>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D >>>> > >>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds >>>> >>> < >>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D >>>> > >>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds >>>> >>> < >>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D >>>> > >>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds >>>> >>> < >>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D >>>> > >>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds >>>> >>> < >>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D >>>> > >>>> >>> >>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com> >>>> wrote: >>>> >>> >>>> >>>> +Alan Myrvold <amyrv...@google.com> is doing a one time cleanup. >>>> I agree >>>> >>>> that we need to have a solution to automate this task or address >>>> the root >>>> >>>> cause of the buildup. >>>> >>>> >>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia < >>>> michal.wale...@polidea.com> >>>> >>>> wrote: >>>> >>>> >>>> >>>>> Hi there, >>>> >>>>> it seems we have a problem with Jenkins workers again. Nodes 1 >>>> and 7 >>>> >>>>> both fail jobs with "No space left on device". >>>> >>>>> Who is the best person to contact in these cases (someone with >>>> access >>>> >>>>> permissions to the workers). >>>> >>>>> >>>> >>>>> I also noticed that such errors are becoming more and more >>>> frequent >>>> >>>>> recently and I'd like to discuss how can this be remedied. Can a >>>> cleanup >>>> >>>>> task be automated on Jenkins somehow? >>>> >>>>> >>>> >>>>> Regards >>>> >>>>> Michal >>>> >>>>> >>>> >>>>> -- >>>> >>>>> >>>> >>>>> Michał Walenia >>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer >>>> >>>>> >>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002 >>>> <+48%20791%20432%20002>> >>>> >>>>> E: michal.wale...@polidea.com >>>> >>>>> >>>> >>>>> Unique Tech >>>> >>>>> Check out our projects! <https://www.polidea.com/our-work> >>>> >>>>> >>>> >>>> >>>> >> >>>> >>>
smime.p7s
Description: S/MIME Cryptographic Signature