SGTM. Thank you very much Daniel! On Mon, Oct 11, 2021, 7:51 PM Ahmet Altay <al...@google.com> wrote:
> Thank you Daniel. Could you please update the wiki once you are done with > the process? > > On Mon, Oct 11, 2021 at 6:22 PM Daniel Oliveira <danolive...@google.com> > wrote: > >> Took me a bit to get to this, sorry. I finally figured out an approach >> for updating Go and did so and will be updating the image momentarily. >> >> I think a more important note is that I tried what Valentyn was >> considering, which is SSHing into workers and updating the dependency. I'll >> describe the process below, but the summary is that I did it on one worker >> with Go so far, saw no problems over the weekend, and would like to >> continue updating the rest of the workers if there are no objections. >> >> Here's a step-by-step of what I did. If we decide to stick with this >> approach, these instructions can be added to Confluence: >> >> 1. Go to the page for the Jenkins agent you want to update [1] and click >> "Mark this node temporarily offline", leaving a reason such as "Updating X >> dependency." >> 2. Wait until there are no more tests running in that agent (under "Build >> Executor Status" on the left of the page). >> 3. SSH into the agent and perform the update. >> 4. Mark the node as online again. >> 5. Repeat for every worker. >> >> And these are some additional steps if you want to immediately run a test >> suite to check that the update worked correctly. For example in my case, I >> wanted to check against the Go Postcommit, and it was a good thing I did, >> because it actually failed the first time and I had to go back in to fix a >> small oversight I made. So doing this after you update your first worker is >> probably a good idea before updating the rest: >> >> 1. Go to the page for the job you want to run (for example: [2]). >> 2. Click "Configure" on the left menu. >> 3. Find the checkmark "Restrict where this project can be run" and change >> the restriction from "beam" to the specific name of the agent (ex. >> "apache-beam-jenkins-1"). >> 4. Save and apply that change. >> 5. Back on the page for the job, click "Build with Parameters" on the >> left menu. >> 6. Run the build on "master". >> 7. Once you're done checking the results, change the restriction for the >> job back to "beam". (This also gets reset once every 24 hours in case you >> forget.) >> >> I did that on one agent (apache-beam-jenkins-2) on Friday evening when it >> wasn't too busy, and got Go updated and working. I checked that agent's >> execution history again today just in case, and it was healthy over >> the weekend, with no Go-related problems as far as I could see. If there's >> no objections I'd like to go ahead and continue updating the rest of the >> workers (I'll do this late at night or over the weekend to avoid disrupting >> dev work). >> >> [1] https://ci-beam.apache.org/computer/apache-beam-jenkins-1/ >> [2] https://ci-beam.apache.org/job/beam_PostCommit_Go/ >> >> On Mon, Oct 4, 2021 at 6:14 PM Valentyn Tymofieiev <valen...@google.com> >> wrote: >> >>> I updated the image in [1], but did not change the workers yet to pick >>> up the new image yet. We can do this once we add Go changes on top of it. >>> >>> I am also considering to SSH into every worker and run a one-line >>> command that adds the dependency that was missing. It seems to be low risk, >>> and there is a fall-back plan to re-start the worker using the saved image >>> - both new and old images are saved and available in Cloud Console. >>> >>> Ideally, we should find a way to do a rolling upgrade that a PMC or >>> committer could trigger without logging into every machine. >>> >>> [1] >>> https://issues.apache.org/jira/browse/BEAM-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424228#comment-17424228 >>> >>> >>> On Wed, Sep 22, 2021 at 3:28 PM Daniel Oliveira <danolive...@google.com> >>> wrote: >>> >>>> @Brian Hulette <bhule...@google.com> That button seems like exactly >>>> what we'd need. Doing it manually would be a pain, but it's probably still >>>> preferable to causing a bunch of aborted tests. >>>> >>>> @Valentyn Tymofieiev <valen...@google.com> Collaborating to do both >>>> updates at once is a great idea! I'll message you directly about it. >>>> >>>> On Wed, Sep 22, 2021 at 2:44 PM Valentyn Tymofieiev < >>>> valen...@google.com> wrote: >>>> >>>>> I am also interested in this updating version of Python on VMs, I need >>>>> to install Python 3.9. Thanks for looking into this. We can coordinate >>>>> together to make one update instead of two. >>>>> >>>>> On Wed, Sep 22, 2021 at 2:40 PM Brian Hulette <bhule...@google.com> >>>>> wrote: >>>>> >>>>>> I'm not sure about best practices here. Out of curiosity I just poked >>>>>> around in the Jenkins UI (e.g. [1]) and it looks like you can manually >>>>>> "Mark node temporarily offline" when logged in (if you're a committer). >>>>>> According to [2] this will prevent it from picking up new jobs after it's >>>>>> finished the currently executing ones. Doing that manually for every >>>>>> worker >>>>>> could be a pain though. >>>>>> >>>>>> Brian >>>>>> >>>>>> [1] https://ci-beam.apache.org/computer/apache-beam-jenkins-13/ >>>>>> [2] >>>>>> https://stackoverflow.com/questions/26553612/how-do-i-disable-a-node-in-jenkins-ui-after-it-has-completed-its-currently-runni >>>>>> >>>>>> On Wed, Sep 22, 2021 at 1:03 PM Daniel Oliveira < >>>>>> danolive...@google.com> wrote: >>>>>> >>>>>>> Hey everyone, >>>>>>> >>>>>>> I'm aiming at upgrading the version of Go on our Jenkins VMs, and I >>>>>>> found these instructions on upgrading software on Jenkins >>>>>>> <https://cwiki.apache.org/confluence/display/BEAM/Jenkins+Tips#JenkinsTips-HowtoinstallandupgradesoftwareonJenkinsworkers> >>>>>>> on >>>>>>> our cwiki. >>>>>>> >>>>>>> I haven't started going through it yet, but I was wondering about >>>>>>> the last few steps that involve stopping VMs, deleting boot disks, and >>>>>>> restarting executors. Is there some best practice for that section to >>>>>>> avoid >>>>>>> causing interruptions in our automated testing? Should I be trying to do >>>>>>> this outside of peak dev hours, or going one VM at a time so others can >>>>>>> pick up extra load, or anything like that? >>>>>>> >>>>>>> Thanks, >>>>>>> Daniel Oliveira >>>>>>> >>>>>>