You're right, job workspaces should be hermetic.


On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles <k...@apache.org> wrote:

> I'm probably late to this discussion and missing something, but why are we
> writing to /tmp at all? I would expect TMPDIR to point somewhere inside the
> job directory that will be wiped by Jenkins, and I would expect code to
> always create temp files via APIs that respect this. Is Jenkins not
> cleaning up? Do we not have the ability to set this up? Do we have bugs in
> our code (that we could probably find by setting TMPDIR to somewhere
> not-/tmp and running the tests without write permission to /tmp, etc)
>
> Kenn
>
> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay <al...@google.com> wrote:
>
>> Related to workspace directory growth, +Udi Meiri <eh...@google.com> filed
>> a relevant issue previously (
>> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up
>> workspace directory after successful jobs. Alternatively, we can consider
>> periodically cleaning up the /src directories.
>>
>> I would suggest moving the cron task from internal cron scripts to the
>> inventory job (
>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
>> That way, we can see all the cron jobs as part of the source tree, adjust
>> frequencies and clean up codes with PRs. I do not know how internal cron
>> scripts are created, maintained, and how would they be recreated for new
>> worker instances.
>>
>> /cc +Tyson Hamilton <tyso...@google.com>
>>
>> On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
>> damian.gadom...@polidea.com> wrote:
>>
>>> Hey,
>>>
>>> I've recently created a solution for the growing /tmp directory. Part of
>>> it is the job mentioned by Tyson: *beam_Clean_tmp_directory*. It's
>>> intentionally not triggered by cron and should be a last resort solution
>>> for some strange cases.
>>>
>>> Along with that job, I've also updated every worker with an internal
>>> cron script. It's being executed once a week and deletes all the files (and
>>> only files) that were not accessed for at least three days. That's designed
>>> to be as safe as possible for the running jobs on the worker (not to delete
>>> the files that are still in use), and also to be insensitive to the current
>>> workload on the machine. The cleanup will always happen, even if some
>>> long-running/stuck jobs are blocking the machine.
>>>
>>> I also think that currently the "No space left" errors may be a
>>> consequence of growing workspace directory rather than /tmp. I didn't do
>>> any detailed analysis but e.g. currently, on apache-beam-jenkins-7 the
>>> workspace directory size is 158 GB while /tmp is only 16 GB. We should
>>> either guarantee the disk size to hold workspaces for all jobs (because
>>> eventually, every worker will execute each job) or clear also the
>>> workspaces in some way.
>>>
>>> Regards,
>>> Damian
>>>
>>>
>>> On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels <m...@apache.org>
>>> wrote:
>>>
>>>> +1 for scheduling it via a cron job if it won't lead to test failures
>>>> while running. Not a Jenkins expert but maybe there is the notion of
>>>> running exclusively while no other tasks are running?
>>>>
>>>> -Max
>>>>
>>>> On 17.07.20 21:49, Tyson Hamilton wrote:
>>>> > FYI there was a job introduced to do this in Jenkins:
>>>> beam_Clean_tmp_directory
>>>> >
>>>> > Currently it needs to be run manually. I'm seeing some out of disk
>>>> related errors in precommit tests currently, perhaps we should schedule
>>>> this job with cron?
>>>> >
>>>> >
>>>> > On 2020/03/11 19:31:13, Heejong Lee <heej...@google.com> wrote:
>>>> >> Still seeing no space left on device errors on jenkins-7 (for
>>>> example:
>>>> >> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/
>>>> )
>>>> >>
>>>> >>
>>>> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold <amyrv...@google.com>
>>>> wrote:
>>>> >>
>>>> >>> Did a one time cleanup of tmp files owned by jenkins older than 3
>>>> days.
>>>> >>> Agree that we need a longer term solution.
>>>> >>>
>>>> >>> Passing recent tests on all executors except jenkins-12, which has
>>>> not
>>>> >>> scheduled recent builds for the past 13 days. Not scheduling:
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds&sa=D
>>>> >
>>>> >>> Recent passing builds:
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-15/builds&sa=D
>>>> >
>>>> >>> https://builds.apache.org/computer/apache-beam-jenkins-16/builds
>>>> >>> <
>>>> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-16/builds&sa=D
>>>> >
>>>> >>>
>>>> >>> On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay <al...@google.com>
>>>> wrote:
>>>> >>>
>>>> >>>> +Alan Myrvold <amyrv...@google.com> is doing a one time cleanup.
>>>> I agree
>>>> >>>> that we need to have a solution to automate this task or address
>>>> the root
>>>> >>>> cause of the buildup.
>>>> >>>>
>>>> >>>> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia <
>>>> michal.wale...@polidea.com>
>>>> >>>> wrote:
>>>> >>>>
>>>> >>>>> Hi there,
>>>> >>>>> it seems we have a problem with Jenkins workers again. Nodes 1
>>>> and 7
>>>> >>>>> both fail jobs with "No space left on device".
>>>> >>>>> Who is the best person to contact in these cases (someone with
>>>> access
>>>> >>>>> permissions to the workers).
>>>> >>>>>
>>>> >>>>> I also noticed that such errors are becoming more and more
>>>> frequent
>>>> >>>>> recently and I'd like to discuss how can this be remedied. Can a
>>>> cleanup
>>>> >>>>> task be automated on Jenkins somehow?
>>>> >>>>>
>>>> >>>>> Regards
>>>> >>>>> Michal
>>>> >>>>>
>>>> >>>>> --
>>>> >>>>>
>>>> >>>>> Michał Walenia
>>>> >>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>> >>>>>
>>>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002> <+48791432002
>>>> <+48%20791%20432%20002>>
>>>> >>>>> E: michal.wale...@polidea.com
>>>> >>>>>
>>>> >>>>> Unique Tech
>>>> >>>>> Check out our projects! <https://www.polidea.com/our-work>
>>>> >>>>>
>>>> >>>>
>>>> >>
>>>>
>>>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to