SGTM. Thank you very much Daniel!

On Mon, Oct 11, 2021, 7:51 PM Ahmet Altay <al...@google.com> wrote:

> Thank you Daniel. Could you please update the wiki once you are done with
> the process?
>
> On Mon, Oct 11, 2021 at 6:22 PM Daniel Oliveira <danolive...@google.com>
> wrote:
>
>> Took me a bit to get to this, sorry. I finally figured out an approach
>> for updating Go and did so and will be updating the image momentarily.
>>
>> I think a more important note is that I tried what Valentyn was
>> considering, which is SSHing into workers and updating the dependency. I'll
>> describe the process below, but the summary is that I did it on one worker
>> with Go so far, saw no problems over the weekend, and would like to
>> continue updating the rest of the workers if there are no objections.
>>
>> Here's a step-by-step of what I did. If we decide to stick with this
>> approach, these instructions can be added to Confluence:
>>
>> 1. Go to the page for the Jenkins agent you want to update [1] and click
>> "Mark this node temporarily offline", leaving a reason such as "Updating X
>> dependency."
>> 2. Wait until there are no more tests running in that agent (under "Build
>> Executor Status" on the left of the page).
>> 3. SSH into the agent and perform the update.
>> 4. Mark the node as online again.
>> 5. Repeat for every worker.
>>
>> And these are some additional steps if you want to immediately run a test
>> suite to check that the update worked correctly. For example in my case, I
>> wanted to check against the Go Postcommit, and it was a good thing I did,
>> because it actually failed the first time and I had to go back in to fix a
>> small oversight I made. So doing this after you update your first worker is
>> probably a good idea before updating the rest:
>>
>> 1. Go to the page for the job you want to run (for example: [2]).
>> 2. Click "Configure" on the left menu.
>> 3. Find the checkmark "Restrict where this project can be run" and change
>> the restriction from "beam" to the specific name of the agent (ex.
>> "apache-beam-jenkins-1").
>> 4. Save and apply that change.
>> 5. Back on the page for the job, click "Build with Parameters" on the
>> left menu.
>> 6. Run the build on "master".
>> 7. Once you're done checking the results, change the restriction for the
>> job back to "beam". (This also gets reset once every 24 hours in case you
>> forget.)
>>
>> I did that on one agent (apache-beam-jenkins-2) on Friday evening when it
>> wasn't too busy, and got Go updated and working. I checked that agent's
>> execution history again today just in case, and it was healthy over
>> the weekend, with no Go-related problems as far as I could see. If there's
>> no objections I'd like to go ahead and continue updating the rest of the
>> workers (I'll do this late at night or over the weekend to avoid disrupting
>> dev work).
>>
>> [1] https://ci-beam.apache.org/computer/apache-beam-jenkins-1/
>> [2] https://ci-beam.apache.org/job/beam_PostCommit_Go/
>>
>> On Mon, Oct 4, 2021 at 6:14 PM Valentyn Tymofieiev <valen...@google.com>
>> wrote:
>>
>>> I updated the image in [1], but did not change the workers yet to pick
>>> up the new image yet. We can do this once we add Go changes on top of it.
>>>
>>> I am also considering to SSH into every worker and run a one-line
>>> command that adds the dependency that was missing. It seems to be low risk,
>>> and  there is a fall-back plan to re-start the worker using the saved image
>>> - both new and old images are saved and available in Cloud Console.
>>>
>>> Ideally, we should find a way to do a rolling upgrade that a PMC or
>>> committer could trigger without logging into every machine.
>>>
>>> [1]
>>> https://issues.apache.org/jira/browse/BEAM-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424228#comment-17424228
>>>
>>>
>>> On Wed, Sep 22, 2021 at 3:28 PM Daniel Oliveira <danolive...@google.com>
>>> wrote:
>>>
>>>> @Brian Hulette <bhule...@google.com> That button seems like exactly
>>>> what we'd need. Doing it manually would be a pain, but it's probably still
>>>> preferable to causing a bunch of aborted tests.
>>>>
>>>> @Valentyn Tymofieiev <valen...@google.com> Collaborating to do both
>>>> updates at once is a great idea! I'll message you directly about it.
>>>>
>>>> On Wed, Sep 22, 2021 at 2:44 PM Valentyn Tymofieiev <
>>>> valen...@google.com> wrote:
>>>>
>>>>> I am also interested in this updating version of Python on VMs, I need
>>>>> to install Python 3.9. Thanks for looking into this.  We can coordinate
>>>>> together to make one update instead of two.
>>>>>
>>>>> On Wed, Sep 22, 2021 at 2:40 PM Brian Hulette <bhule...@google.com>
>>>>> wrote:
>>>>>
>>>>>> I'm not sure about best practices here. Out of curiosity I just poked
>>>>>> around in the Jenkins UI (e.g. [1]) and it looks like you can manually
>>>>>> "Mark node temporarily offline" when logged in (if you're a committer).
>>>>>> According to [2] this will prevent it from picking up new jobs after it's
>>>>>> finished the currently executing ones. Doing that manually for every 
>>>>>> worker
>>>>>> could be a pain though.
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>> [1] https://ci-beam.apache.org/computer/apache-beam-jenkins-13/
>>>>>> [2]
>>>>>> https://stackoverflow.com/questions/26553612/how-do-i-disable-a-node-in-jenkins-ui-after-it-has-completed-its-currently-runni
>>>>>>
>>>>>> On Wed, Sep 22, 2021 at 1:03 PM Daniel Oliveira <
>>>>>> danolive...@google.com> wrote:
>>>>>>
>>>>>>> Hey everyone,
>>>>>>>
>>>>>>> I'm aiming at upgrading the version of Go on our Jenkins VMs, and I
>>>>>>> found these instructions on upgrading software on Jenkins
>>>>>>> <https://cwiki.apache.org/confluence/display/BEAM/Jenkins+Tips#JenkinsTips-HowtoinstallandupgradesoftwareonJenkinsworkers>
>>>>>>>  on
>>>>>>> our cwiki.
>>>>>>>
>>>>>>> I haven't started going through it yet, but I was wondering about
>>>>>>> the last few steps that involve stopping VMs, deleting boot disks, and
>>>>>>> restarting executors. Is there some best practice for that section to 
>>>>>>> avoid
>>>>>>> causing interruptions in our automated testing? Should I be trying to do
>>>>>>> this outside of peak dev hours, or going one VM at a time so others can
>>>>>>> pick up extra load, or anything like that?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Daniel Oliveira
>>>>>>>
>>>>>>

Reply via email to