Thanks Shane. On Thu, 26 Nov 2020, 10:19 shane knapp ☠, <skn...@berkeley.edu> wrote:
> alright, builds are looking solid except for SBT... if someone here could > take a look at those failures i'd be most appreciative. > > the important ones: PRB, PRB-K8s, k8s, snapshot and maven builds all > green! > > i'm literally gobsmacked by how smoothly this went. :) > > we're all going to enjoy a mellow holiday and i'll check build statuses > every now and then and see if i find anything else like this: > https://issues.apache.org/jira/browse/SPARK-33565 > > have a great holiday everyone! we'll start getting the new primary set up > on monday, and hopefully by tuesday be fully up and running. > > shane > > > On Wed, Nov 25, 2020 at 1:35 PM shane knapp ☠ <skn...@berkeley.edu> wrote: > >> hey all, work is going quite well and smoothly for this project. >> >> today's update: >> >> we will experience significant downtime monday/tuesday as we spin up the >> new primary jenkins node. until then, we'll be building over the next few >> days so i'll have a chance to better track down and fix any system-level >> build breaks. >> >> but most importantly, i just added 3 of the 4 new ubuntu 20.04 workers to >> the pool: research-jenkins-worker-03, 04 and 06. -05 is being difficult, >> so i'm going to let it pout in the corner for a while before hitting it >> again w/the ansible cannon. >> >> shane >> >> On Tue, Nov 24, 2020 at 6:08 PM shane knapp ☠ <skn...@berkeley.edu> >> wrote: >> >>> all spark builds have been ported and triggered: >>> >>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/ >>> >>> not shown are the regular and k8s PRB, which are also running. >>> >>> i think i've nailed down most of the stupid PATH and JAVA_HOME issues, >>> but i'm sure we'll have some stuff to work out. i'm mostly keeping an eye >>> on the build history of research-jenkins-worker-01 and -02, as they're >>> running the latest OS + ansible (which will be moved in to the spark repo >>> asap). >>> >>> i'm still concerned about sbt failures, which includes the PRB. we'll >>> see how things go, and just focus on getting things working on ubuntu 20 >>> LTS. if we need to drop the ubuntu 16 workers from the pool temporarily, i >>> would be more than happy to do that. we'll lose some capacity, but it >>> looks like we have a solid template for getting these suckers redeployed so >>> turn-around should be pretty quick. >>> >>> we also need to dedicate some time to clean up/fix our plugin configs. >>> there's been a lot of change over the past three years and things like PRB >>> triggers seem flaky (it took 28m instead of 5m for this job to trigger: >>> https://github.com/apache/spark/pull/29994) >>> >>> this all being said, i'm really happy w/our progress so far and have >>> started leaning towards 'cautiously optimistic'... we'll see how things go >>> and recalibrate accordingly. i'll have a better idea of where we are >>> tomorrow and keep the list updated. >>> >>> and finally: a HUGE thanks goes out to jon for the work going on at the >>> colo this moment: rack rearrangement, cleaning up networking, fixing >>> hardware, reimaging and generally kicking ass! >>> >>> have a great holiday! >>> >>> shane >>> >>> On Tue, Nov 24, 2020 at 2:24 PM shane knapp ☠ <skn...@berkeley.edu> >>> wrote: >>> >>>> our very first ubuntu-based PRB is running: >>>> >>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131701/ >>>> >>>> crossing my fingers! :) >>>> >>>> On Tue, Nov 24, 2020 at 1:30 PM shane knapp ☠ <skn...@berkeley.edu> >>>> wrote: >>>> >>>>> due to scheduling, upcoming holiday and in-the-colo work requirements, >>>>> all of the centos workers are being wiped NOW. >>>>> >>>>> this is great, as the sooner we can get started on fixing builds the >>>>> better. i'm not going anywhere over the holiday, so i'll get a good >>>>> head-start on things. >>>>> >>>>> thank you jon! >>>>> >>>>> shane >>>>> >>>>> On Tue, Nov 24, 2020 at 11:24 AM shane knapp ☠ <skn...@berkeley.edu> >>>>> wrote: >>>>> >>>>>> this is a lengthy, but important read for everyone here. >>>>>> >>>>>> in the next few days, the remaining centos machines (PRB/SBT workers >>>>>> AND primary) will have be reimaged from centos6.9 to ubuntu 20.04LTS. >>>>>> >>>>>> this means three important things on the very near horizon: >>>>>> 1 -- the PRB and SBT tests WILL BE BROKEN (by thanksgiving) >>>>>> 2 -- jenkins itself will be down for a while as we move the jenkins >>>>>> installation to it's new home. >>>>>> 3 -- those of you with accounts here will temporarily lose access >>>>>> >>>>>> regarding (1), brian (cced) will be helping me debug and fix any >>>>>> system-level bugs (python envs, missing packages, etc). jon (cced) will >>>>>> be >>>>>> doing the reimaging and cobbling together of hardware to keep us on our >>>>>> feet. their help is going to be invaluable to getting us back on the >>>>>> ground. >>>>>> >>>>>> we already have two ubuntu 20 workers up and building >>>>>> (research-jenkins-worker-0[1,2]), and the SparkPullRequestBuilder-K8s >>>>>> build >>>>>> is already green. i'll keep an eye on these workers to ensure i didn't >>>>>> miss anything. >>>>>> >>>>>> once we have a couple of more ubuntu 20 machines up, i'll move the >>>>>> PRB and SBT builds there and let them fail as often as possible so we can >>>>>> use the build logs during the migration of the primary. >>>>>> >>>>>> then we shut down jenkins and move to the new primary. >>>>>> >>>>>> this will all be happening in the next week to week-and-a-half. >>>>>> >>>>>> nearish on the horizon, we need to do two things: >>>>>> 1 -- reimage the ubuntu 16 workers >>>>>> 2 -- clean up the all of the breakages within jenkins plugin >>>>>> universe. there's a lot of stacktraces everywhere after the upgrade, but >>>>>> things are still building so i'm inclined to push this out. >>>>>> 3 -- fix the PRB/SBT builds. >>>>>> >>>>>> further off, once we're stable, we (the spark community) will need to >>>>>> have an honest conversation about where the build system lives. we don't >>>>>> currently have enough resources here to manage the system in a way that >>>>>> it >>>>>> deserves, and i can't forsee getting the staffing for long-term support >>>>>> any >>>>>> time soon. >>>>>> >>>>>> however, with the ansible configs (which i plan on moving to the >>>>>> spark repo), it should be much easier to replicate the build system. >>>>>> >>>>>> by this time next year, i would like to have helped find the build >>>>>> system a new home, and sunset jenkins. over the past 11 years (i think), >>>>>> this system has built spark. it's getting a little tired and needs a >>>>>> well >>>>>> deserved break. :) >>>>>> >>>>>> shane >>>>>> -- >>>>>> Shane Knapp >>>>>> Computer Guy / Voice of Reason >>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>>>>> https://rise.cs.berkeley.edu >>>>>> >>>>> >>>>> >>>>> -- >>>>> Shane Knapp >>>>> Computer Guy / Voice of Reason >>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>>>> https://rise.cs.berkeley.edu >>>>> >>>> >>>> >>>> -- >>>> Shane Knapp >>>> Computer Guy / Voice of Reason >>>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>>> https://rise.cs.berkeley.edu >>>> >>> >>> >>> -- >>> Shane Knapp >>> Computer Guy / Voice of Reason >>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>> https://rise.cs.berkeley.edu >>> >> >> >> -- >> Shane Knapp >> Computer Guy / Voice of Reason >> UC Berkeley EECS Research / RISELab Staff Technical Lead >> https://rise.cs.berkeley.edu >> > > > -- > Shane Knapp > Computer Guy / Voice of Reason > UC Berkeley EECS Research / RISELab Staff Technical Lead > https://rise.cs.berkeley.edu >