Thank you for sharing the plan, Shane. It's great!

I'll participate actively if there is any issue during migration.

To share the current status on new target OS (Ubuntu 20), Apache Spark
master branch successfully migrated to Ubuntu 20.04 in the GitHub Action
environment first.

    [SPARK-33156][INFRA] Upgrade GithubAction image from 18.04 to 20.04
    [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs
    [SPARK-33239][INFRA] Use pre-built image at GitHub Action SparkR job

For PySpark/SparkR testing, we may be able to take advantage of the
pre-built image in Jenkins environment too because it will remove all
installation flakiness issues by isolating them from testing.

Also, in order to prepare the migration of `branch-3.0/branch-2.4`, we can
backport the above patches (SPARK-33156/SPARK-33162/SPARK-33239) to
`branch-3.0/branch-2.4`.

Bests,
Dongjoon.


On Mon, Nov 2, 2020 at 1:16 PM shane knapp ☠ <skn...@berkeley.edu> wrote:

> TL;DR:  our build system is ancient, EOLed and about to get hit hard w/a
> secops hammer.  we need to literally reinstall the entire cluster from
> scratch and get things working.
>
> here are the high level bullet points about what's coming up in the next
> month:
>
> ** all amp-jenkins-worker-* nodes are running centos 6, and the remainder
> ubuntu 16.  these will be upgraded to ubuntu 20.
>
> i will be doing this in stages so as to minimize downtime.
>
> ** ALL BUILDS NEED TO BE PORTED TO UBUNTU 20.  i can ensure that the
> environments on the nodes are identical, but i have yet been able to
> successfully build any SBT jobs on any version of ubuntu, and the MVN
> builds won't run on ubuntu 18 (tho they work fine on 16).  i also have had
> difficulty getting the PRB job to successfully finish on ubuntu.
>
> for this, i will definitely need help from the dev community to get things
> working...  and the speed at which things are fixed will be inversely
> proportional to how much help i get.  :)
>
> ** amplab jenkins primary node will need two major upgrades:  OS from
> centos 6 to ubuntu 20, and jenkins from 1.6 to 2.X LTS...
>
> i'm most concerned about this, as it is literally the exact same jenkins
> installtion that patrick wendell set up over 10 years ago.  there are many
> publish secrets that are entered in to the jenkins config and i'd really
> hope that we don't lose them.
>
> my plan here is to upgrade the current jenkins, and fix any things that
> break.  then we'll rsync jenkins' homedir to the new primary node and hope
> that works.  :)
>
> ** user audits
>
> UC berkeley's new security standards require quarterly audits of
> non-affiliated accounts...  this won't impact only but a few people on this
> list, but i'll need to work w/campus and our department on solutions for
> this other than local accounts on the servers.
>
> a LOT is going to happen, and i'm meeting w/my team today and will come up
> w/a basic plan.  we will definitely experience downtime during this, but i
> cannot guess as to what that will look like.
>
> this might also be a good time to talk about the future of the build
> system, auditing our builds (do we need SBT?), or even finally getting
> around to dockerizing everything  so i don't need such a fragile and
> non-atomic set of worker nodes specifically for spark.
>
> thoughts?  comments?
>
> shane
>
> ps -- this is one of the reasons why i haven't been around much lately...
> it's been really tough keeping things up to date while trying to remotely
> train up one of my sysadmins to take over some of my build system duties.
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>

Reply via email to