FYI it looks like all the Go tests are now failing because it can't find
the Go command at all.
Did a Jenkins image without Go (v1.16+) pre-installed get pushed?

On Mon, Oct 18, 2021, 1:45 PM Valentyn Tymofieiev <[email protected]>
wrote:

> Thanks Daniel,
>
> I can recreate the VMs on new disks.
>
> We currently have a set of stopped jenkins workers (named:
> apache-beam-jenkins-##) and running workers (named:
> apache-ci-beam-jenkins-##)
>
> Are there any concerns about deleting the stopped group of workers?
>
>
>
> On Mon, Oct 18, 2021 at 11:19 AM Ahmet Altay <[email protected]> wrote:
>
>> Thank you Daniel, Valentyn!
>>
>> On Mon, Oct 18, 2021 at 8:02 AM Daniel Oliveira <[email protected]>
>> wrote:
>>
>>> I performed a light update of both Go and Python (from Valentyn's
>>> update) on each worker VM over the weekend. I also added additional
>>> instructions for the light update to Confluence (as an alternative to the
>>> current instructions).
>>>
>>> There is still reason to perform a full update at some point: Valentyn
>>> updated the VM image from 500 GB to 1000 GB of storage, which requires a
>>> full update to actually take effect.
>>>
>>> On Tue, Oct 12, 2021 at 10:32 AM Valentyn Tymofieiev <
>>> [email protected]> wrote:
>>>
>>>> > 3. SSH into the agent and perform the update.
>>>> So, this would be a 'lite' version of the update, where we make changes
>>>> to the live worker without recreating worker VM with a new image? We could
>>>> perhaps document both options, and also make it clear that producing a VM
>>>> image that has necessary updates is mandatory even if we perform 'lite'
>>>> updates without recreating the worker.
>>>> Also, for a lite update, marking the Jenkins offer offline may be
>>>> optional, as some updates might not be disruptive (such as installing some
>>>> software that will not be used immediately).
>>>>
>>>>
>>>>
>>>> On Mon, Oct 11, 2021 at 7:53 PM Robert Burke <[email protected]>
>>>> wrote:
>>>>
>>>>> SGTM. Thank you very much Daniel!
>>>>>
>>>>> On Mon, Oct 11, 2021, 7:51 PM Ahmet Altay <[email protected]> wrote:
>>>>>
>>>>>> Thank you Daniel. Could you please update the wiki once you are done
>>>>>> with the process?
>>>>>>
>>>>>> On Mon, Oct 11, 2021 at 6:22 PM Daniel Oliveira <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Took me a bit to get to this, sorry. I finally figured out an
>>>>>>> approach for updating Go and did so and will be updating the image
>>>>>>> momentarily.
>>>>>>>
>>>>>>> I think a more important note is that I tried what Valentyn was
>>>>>>> considering, which is SSHing into workers and updating the dependency. 
>>>>>>> I'll
>>>>>>> describe the process below, but the summary is that I did it on one 
>>>>>>> worker
>>>>>>> with Go so far, saw no problems over the weekend, and would like to
>>>>>>> continue updating the rest of the workers if there are no objections.
>>>>>>>
>>>>>>> Here's a step-by-step of what I did. If we decide to stick with this
>>>>>>> approach, these instructions can be added to Confluence:
>>>>>>>
>>>>>>> 1. Go to the page for the Jenkins agent you want to update [1] and
>>>>>>> click "Mark this node temporarily offline", leaving a reason such as
>>>>>>> "Updating X dependency."
>>>>>>> 2. Wait until there are no more tests running in that agent (under
>>>>>>> "Build Executor Status" on the left of the page).
>>>>>>> 3. SSH into the agent and perform the update.
>>>>>>> 4. Mark the node as online again.
>>>>>>> 5. Repeat for every worker.
>>>>>>>
>>>>>>> And these are some additional steps if you want to immediately run a
>>>>>>> test suite to check that the update worked correctly. For example in my
>>>>>>> case, I wanted to check against the Go Postcommit, and it was a good 
>>>>>>> thing
>>>>>>> I did, because it actually failed the first time and I had to go back 
>>>>>>> in to
>>>>>>> fix a small oversight I made. So doing this after you update your first
>>>>>>> worker is probably a good idea before updating the rest:
>>>>>>>
>>>>>>> 1. Go to the page for the job you want to run (for example: [2]).
>>>>>>> 2. Click "Configure" on the left menu.
>>>>>>> 3. Find the checkmark "Restrict where this project can be run" and
>>>>>>> change the restriction from "beam" to the specific name of the agent 
>>>>>>> (ex.
>>>>>>> "apache-beam-jenkins-1").
>>>>>>> 4. Save and apply that change.
>>>>>>> 5. Back on the page for the job, click "Build with Parameters" on
>>>>>>> the left menu.
>>>>>>> 6. Run the build on "master".
>>>>>>> 7. Once you're done checking the results, change the restriction for
>>>>>>> the job back to "beam". (This also gets reset once every 24 hours in 
>>>>>>> case
>>>>>>> you forget.)
>>>>>>>
>>>>>>> I did that on one agent (apache-beam-jenkins-2) on Friday evening
>>>>>>> when it wasn't too busy, and got Go updated and working. I checked that
>>>>>>> agent's execution history again today just in case, and it was healthy 
>>>>>>> over
>>>>>>> the weekend, with no Go-related problems as far as I could see. If 
>>>>>>> there's
>>>>>>> no objections I'd like to go ahead and continue updating the rest of the
>>>>>>> workers (I'll do this late at night or over the weekend to avoid 
>>>>>>> disrupting
>>>>>>> dev work).
>>>>>>>
>>>>>>> [1] https://ci-beam.apache.org/computer/apache-beam-jenkins-1/
>>>>>>> [2] https://ci-beam.apache.org/job/beam_PostCommit_Go/
>>>>>>>
>>>>>>> On Mon, Oct 4, 2021 at 6:14 PM Valentyn Tymofieiev <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> I updated the image in [1], but did not change the workers yet to
>>>>>>>> pick up the new image yet. We can do this once we add Go changes on 
>>>>>>>> top of
>>>>>>>> it.
>>>>>>>>
>>>>>>>> I am also considering to SSH into every worker and run a one-line
>>>>>>>> command that adds the dependency that was missing. It seems to be low 
>>>>>>>> risk,
>>>>>>>> and  there is a fall-back plan to re-start the worker using the saved 
>>>>>>>> image
>>>>>>>> - both new and old images are saved and available in Cloud Console.
>>>>>>>>
>>>>>>>> Ideally, we should find a way to do a rolling upgrade that a PMC or
>>>>>>>> committer could trigger without logging into every machine.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://issues.apache.org/jira/browse/BEAM-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424228#comment-17424228
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Sep 22, 2021 at 3:28 PM Daniel Oliveira <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> @Brian Hulette <[email protected]> That button seems like
>>>>>>>>> exactly what we'd need. Doing it manually would be a pain, but it's
>>>>>>>>> probably still preferable to causing a bunch of aborted tests.
>>>>>>>>>
>>>>>>>>> @Valentyn Tymofieiev <[email protected]> Collaborating to do
>>>>>>>>> both updates at once is a great idea! I'll message you directly about 
>>>>>>>>> it.
>>>>>>>>>
>>>>>>>>> On Wed, Sep 22, 2021 at 2:44 PM Valentyn Tymofieiev <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> I am also interested in this updating version of Python on VMs, I
>>>>>>>>>> need to install Python 3.9. Thanks for looking into this.  We can
>>>>>>>>>> coordinate together to make one update instead of two.
>>>>>>>>>>
>>>>>>>>>> On Wed, Sep 22, 2021 at 2:40 PM Brian Hulette <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> I'm not sure about best practices here. Out of curiosity I just
>>>>>>>>>>> poked around in the Jenkins UI (e.g. [1]) and it looks like you can
>>>>>>>>>>> manually "Mark node temporarily offline" when logged in (if you're a
>>>>>>>>>>> committer). According to [2] this will prevent it from picking up 
>>>>>>>>>>> new jobs
>>>>>>>>>>> after it's finished the currently executing ones. Doing that 
>>>>>>>>>>> manually for
>>>>>>>>>>> every worker could be a pain though.
>>>>>>>>>>>
>>>>>>>>>>> Brian
>>>>>>>>>>>
>>>>>>>>>>> [1] https://ci-beam.apache.org/computer/apache-beam-jenkins-13/
>>>>>>>>>>> [2]
>>>>>>>>>>> https://stackoverflow.com/questions/26553612/how-do-i-disable-a-node-in-jenkins-ui-after-it-has-completed-its-currently-runni
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Sep 22, 2021 at 1:03 PM Daniel Oliveira <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hey everyone,
>>>>>>>>>>>>
>>>>>>>>>>>> I'm aiming at upgrading the version of Go on our Jenkins VMs,
>>>>>>>>>>>> and I found these instructions on upgrading software on Jenkins
>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/BEAM/Jenkins+Tips#JenkinsTips-HowtoinstallandupgradesoftwareonJenkinsworkers>
>>>>>>>>>>>>  on
>>>>>>>>>>>> our cwiki.
>>>>>>>>>>>>
>>>>>>>>>>>> I haven't started going through it yet, but I was wondering
>>>>>>>>>>>> about the last few steps that involve stopping VMs, deleting boot 
>>>>>>>>>>>> disks,
>>>>>>>>>>>> and restarting executors. Is there some best practice for that 
>>>>>>>>>>>> section to
>>>>>>>>>>>> avoid causing interruptions in our automated testing? Should I be 
>>>>>>>>>>>> trying to
>>>>>>>>>>>> do this outside of peak dev hours, or going one VM at a time so 
>>>>>>>>>>>> others can
>>>>>>>>>>>> pick up extra load, or anything like that?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Daniel Oliveira
>>>>>>>>>>>>
>>>>>>>>>>>

Reply via email to